Open Source | 2i2c

Protecting our hubs against the CopyFail kernel exploit

Mon, 04 May 2026 00:00:00 +0000

The recently disclosed CopyFail Linux kernel zero-day (CVE-2026-31431) opens up a way for code running inside a container to break out onto the underlying node. We took a close look at our hubs to confirm whether they were exposed, confirmed that our hubs are likely not at risk, and added another layer of protection just in case.

Are 2i2c’s hubs at risk? #

No - based on our testing and mitigation efforts, our hubs are not vulnerable to CopyFail.

Why do we think we’re not at risk? #

We tried to reproduce the exploit on a staging hub by following the public Kubernetes proof-of-concept on both AWS and EKS, and the exploit was unable to break out of the container.
Existing JupyterHub hardening on Kubernetes from jupyterhub/kubespawner#545 (originally added by Yuvi in 2021 in response to a different security issue) had already significantly reduced our risk exposure, and the exposure of anyone else running Z2JH (the standard way to deploy JupyterHub on Kubernetes).
As an extra layer of protection, we deployed copyfail-ebpf-k8s across our hubs in 2i2c-org/infrastructure#8227. It blocks the specific kernel feature that CopyFail depends on. See the project’s explanation for how that works.
We’ve upgraded all GKE clusters to use a patched image in 2i2c-org/infrastructure#8230.

What else did we look into #

Deckhouse’s mitigation was too platform-specific for us.
OVHcloud’s modprobe blocking likely won’t work on Amazon Linux 2023, since the relevant module is built into the kernel image.
AL2023 security advisories - no patched AL2023 image is available yet, so we can’t rely on a kernel-level fix from AWS for now.

Acknowledgements #

Huge thanks to Georgiana for the deep dive into the exploit and whether we’re exposed here.
Thanks to Yuvi for the PR that reduces JupyterHub’s exposure to this back in 2021!
Thanks to iwanhae for the eBPF daemonset we deployed in Kubernetes, and to JupyterHub for the upstream kubespawner hardening that lowered our exposure.
Thanks to our collaborators at NASA VEDA for the ongoing conversations about hub security.

Supporting JupyterHub admins on workshop hubs with shared passwords

Mon, 27 Apr 2026 00:00:00 +0000

To facilitate communities using JupyterHub for a workshop, we introduced the idea of a ‘shared password’ based authentication a few years ago. This lets communities set a single global password that is handed out to all workshop attendees (instead of collecting email addresses or GitHub usernames before the workshop starts). Users can login with their email and this shared global password, and get going immediately. We worked with OpenScapes (which runs a lot of workshops) in developing this method, and it’s been heavily used since among other communities!

However, until now this method required us to restrict admin access for all users, preventing users from using admin features like group management, accessing the JupyterHub Admin dashboard (which lets you see which other users are logged in), and using the shared directory.

To mitigate these, we contributed the SharedPasswordAuthenticator upstream to the JupyterHub organization, allowing hub admins to set set two passwords instead of one. Hub admins can now generate a special password that can be distributed just to admins, and that gives users these extra capabilities without giving up the simplicity of using a single password for all workshop participants!

By contributing this upstream, we made sure this feature is available for everyone to use, rather than just for communities served by 2i2c. We’re rolling this out to our communities now, and many communities have already benefitted from this!

Acknowledgements #

To MinRK for code review and merge of the pull requests!
The OpenScapes community for collaborating with us on building features that benefit themselves and everyone else!
Support from our member communities gives us the capacity to invest in upstream open source engagement and build relationships like this

Report from the Jupyter and AI community meetup

Thu, 23 Apr 2026 00:00:00 +0000

A small group of Jupyter community members recently met in Seattle to demo Jupyter workflows augmented with AI tools and discuss what’s next. This is a quick report-out about what stood out.

The conversations focused on what agentic workflows could unlock for researchers and students - with broad acknowledgement that deployment, security, authorization, and ethics still have a lot to be worked out.

Here are a few highlights from the demos.

Jupyter AI and the Agent Client Protocol #

Jupyter AI v3 is adding support for the Agent Client Protocol (ACP), enabling multi-agent collaboration within JupyterLab. We saw how a single request could be split across multiple specialized agents based on each one’s skills, similar to this demo from JupyterCon 2025.

Some things still to figure out:

No tool isolation yet - agents aren’t scoped by user role or security group.
Manual configuration required - agents running locally need to be registered by hand.
A future goal is for JupyterLab to auto-detect locally installed agents, or for LLM providers to contribute integrations directly.

The Jupyter AI team also shared a new “Personas” abstraction that lets you define a skill-specific agent in just a few lines of Python. Personas open the door to domain-specific agents or agents with crafted “personalities”. They are written in a few lines of Python, though we discussed the possibility of using markdown instead. This seems to be evolving rapidly!

Jupyter AI v3.0.0 has since been released.

Notebook CLI #

nb-cli is a Rust-based tool for interacting with notebooks from the command line - changing cells, running them, checking errors, and rendering the notebook in the console. It can also connect to remote JupyterLab instances, making it useful for headless workflows like CI pipelines or batch grading.

Sidebar comments and Real-time collaboration #

A demo showed Google Docs-style side-panel comments for .ipynb files. Previous collaboration efforts used Conflict-free Replicated Data Types (CRDTs) for multi-cursor editing, but the team felt that approach may not have hit the mark. Side-panel comments felt more natural, though how this extends to .py files is still an open question.

Opportunities for 2i2c #

The meeting helped us understand the role we could play in this space. We think there’s opportunity to help facilitate the use of agents on shared infrastructure, particularly around the upstream control planes for deployment, administration, and authorization. We can also help facilitate community conversations about the ethics and impact of AI tools on open science.

Acknowledgements #

The Jupyter Foundation for supporting the event.
Zach Sailer (Apple), for organizing this meetup.
Chris Holdgraf for editing and adapting the original content for this post.

BIDS joins the mybinder.org federation with help from 2i2c

Thu, 09 Apr 2026 00:00:00 +0000

The Berkeley Institute for Data Science (BIDS) has joined the mybinder.org federation, contributing a new node alongside 2i2c and GESIS. BIDS is the birthplace of mybinder.org, so it’s great to see them back as an active federation member.

We helped out with development, setup, and operations for the new federation member on OVH. Here’s the initiative we just closed out ».

This is a win for the resilience of mybinder.org - the BIDS node runs on a different cloud provider than the existing nodes, reducing the risk that a single provider outage takes down the whole service.

Read the full details on the Jupyter blog.

Report from the Jupyter Security Working Group security tooling sprint

Wed, 08 Apr 2026 00:00:00 +0000

The Jupyter Security Working Group recently held a Security Tooling Sprint. It was a timely event given the recent spate of software supply chain attacks across the tech world.

The sprint covered two main areas:

Governance and strategy — conversations about responsibility and accountability in the face of AI, with emphasis on ensuring humans are ultimately responsible for code committed to Jupyter subprojects. The group also discussed how security could benefit from working group members regularly attending subproject meetings like the JupyterHub Collaboration Cafes.
Automation and tools — the group evaluated several tools for improving security posture across the Jupyter ecosystem. Here are a few that stood out:
- Semgrep as an alternative vulnerability scanner to CodeQL
- Grype, Checkov, and Kubescape for cloud infrastructure misconfiguration checks
- Schemathesis and restler-fuzzer for API fuzz testing

One challenge we discussed was how blindly running security scanning tools generates many false positives. There’s real effort needed to tune these tools for each project’s edge cases before they’re useful in automation. On a related note, we discussed the increase in AI-generated (or AI-assisted) vulnerability and security reports, and the challenges associated with sifting through all of those pieces of information.

Acknowledgements #

Thanks to the jupyter security working group for providing leadership and organizing, in particular Joe Lucas!
Thanks to the Jupyter Foundation for funding community meetings like these.

Upgrading community infrastructure to Kubernetes 1.34 and JupyterHub 4.3.3

Wed, 08 Apr 2026 00:00:00 +0000

We’ve completed a major round of infrastructure upgrades across all 2i2c-managed hubs - every hub is now running Kubernetes 1.34 and Z2JH helm chart 4.3.3.

Running up-to-date versions of both Kubernetes and the JupyterHub helm chart ensures that our communities get the best support and reliability, both in terms of features and security.

A new approach to infrastructure upgrades: upgrading in rounds #

This was the first time we rolled out JupyterHub helm chart upgrades in rounds rather than all at once. By upgrading a subset of hubs at a time, we could identify and fix issues in isolation before they affected the broader network. This made the process safer and more predictable.

We’re planning to perform these kinds of upgrades on a regular schedule for our member communities. Around every 6 months we’ll create an issue to make sure nothing falls through the cracks (here’s example config for creating our reminder issues).

Check out our process docs for multi-hub upgrades for more information.

Learn more #

Check out these pages for what kinds of improvements we’ve brought into our clusters / hubs with these latest updates.

Acknowledgements #

Thanks to Georgiana Dolocan for leading this upgrade effort and establishing the rounds-based approach.
Thanks to Chris Holdgraf for adapting and editing Georgiana’s notes into a blog post.

Combining multiple repos into one site at jupyterbook.org

Mon, 06 Apr 2026 00:00:00 +0000

As part of an initiative to improve jupyterbook.org’s documentation, we refactored the site so that multiple repositories are served under one domain. We wrote up the details on the Jupyter Book blog.

Read the full post here:

How we combine multiple repositories into one website at jupyterbook.org.

Along the way we made several upstream improvements to the MyST Engine and the MyST theme:

parts: support for extends: and URL support as well so we can share a footer / navbar configuration across repositories
internal_domains option so that links to another repository’s content could still be treated as internal
Less aggressive citation parsing so that text like @githubhandle weren’t parsed as a citation
Several mobile and UX fixes (that’s one, there were many others!)

These all felt particularly relevant for documentation that our member communities manage, where you have content split across multiple repositories but served at a single domain.

Acknowledgements #

Thanks to Project Pythia and EarthScope for collaboration and feedback that helped shape this work. And thanks to our member communities whose memberships fund upstream contributions like these.

Multi-repo site support is also useful for training programs that span several content repositories, including NASA Open Science / ScienceCore.

How regularly upgrading core infrastructure leads to upstream improvements and better infrastructure

Fri, 03 Apr 2026 00:00:00 +0000

Our collaborators at NASA VEDA recently asked us about the rationale behind policies for upgrading our infrastructure relatively quickly when new versions come out. Here’s the explanation that we shared with them, in case it’s useful for others as well.

In this case, the decision was whether to upgrade to Helm 4, and you can find our rationale in the /initiatives repository. Here’s a brief summary from Yuvi:

Fundamentally, it helps keep moving us and the ecosystem forward, and drive improvements upstream, in both JupyterHub and Helm.

It has driven these PRs in JupyterHub:

jupyterhub/action-k3s-helm#126 (merged)
jupyterhub/zero-to-jupyterhub-k8s#3797 (validated, but not merged yet)

It’s also driven improvements to helm itself - see this bug report that is being worked on:

helm/helm#31919

Upgrading helm versions can break things (and it has for some of our other communities in the past - see this example). So it’s important we do that on a reasonable timeframe and carefully, to avoid disruptions.

We’re also discovering for example that potentially the new nginx-ingress controller we had to move to has some issues working with older helm versions (ongoing WIP in 2i2c-org/infrastructure#7995). That feels much more tractable because we can now go ‘ok, let us just apply a quick fix now, and wait for the helm 4 rollout, and try again’ instead of being totally stuck.

This is similar to the other part of [/our VEDA objective] - rolling out new versions of jupyterhub. If we need to roll out security fixes, it’s much easier now because we already did the hard work of being up to date:

2i2c-org/infrastructure#7996

This isn’t the case quite yet for helm v3, as it’s still supported, but it’s much better to do this work earlier than wait.

If you encounter a bug in a popular open source software, often you can just ‘wait’ for it to be fixed. But this isn’t just about time - someone somewhere has to put in the effort of getting it fixed, filing helpful upstream bug reports, and testing to make sure it works. This is an example of 2i2c continuing to contribute this effort upstream wherever we can.

Acknowledgements #

Thanks to NASA VEDA for collaborating deeply with us on infrastructure questions like this.

Better sharing UX with nbgitpuller and contextual error handling

Tue, 31 Mar 2026 00:00:00 +0000

TL;DR #

nbgitpuller now has improved UX context-aware error handling. Update to version 1.3.0 and let us know what you think by opening an issue or via the feedback form below 🚀

Check out the initiative we used to structure and complete this work:

GitHub initiative »

What is nbgitpuller? #

nbgitpuller is a way to sync content with compute through the click of a link. Example use cases include:

Interactive research demos, such as Spyglass (HHMI)
Workshops and training scenarios
University exams

In the case of Spyglass, the content is LorenFrankLab/spyglass-demo hosted on GitHub, the compute is a 2i2c cloud hub hosted at spyglass.hhmi.2i2c.cloud, and by using the handy nbgitpuller link generator, you can generate this nbgitpuller link to share with others to seamlessly explore content on the desired compute platform with the relevant data and toolchains installed.

How does nbgitpuller work? #

nbgitpuller is installed in the compute environment. The compute environment now has an /git-pull endpoint URL that can understand where to pull content from using URL parameters. Syncing content critically depends on git operations to fetch, checkout, clone, merge, commit, etc.

When it goes wrong #

Based on the data sent through by the kind folks running the Berkeley DataHub, there were 2163 logs available, of which:

983 (45%) were ‘merge’ conflicts
493 (23%) were ‘fetch’ errors
467 (22%) were ‘ls-remote’ errors

The remaining errors (10%) were mostly ‘checkout’ errors. This represents most frequent errors as seen by students.

In the same set of logs, there were 172 unique errors: most ‘ls-remote’ errors come from mistakes in the content repo URL. This represents most uniquely common errors made by instructors.

Merge conflicts #

If the link author changes content after the link consumer clicked a link, then nbgitpuller needs to sync updates for the consumer on subsequent link clicks. The nbgitpuller merging strategy makes opinionated choices so that the link consumer never has to interact with git, and will always preserve the consumer’s working changes.

Things can go wrong when

Consumers can diverge the git history if they perform a git commit
Authors can diverge the git history if they perform force push commits

Error UX (old) #

Problems with the old UX include:

a scary terminal
difficulty for the user to figure out what went wrong
no suggestion for the user to fix the problem or signpost to continue to the compute platform

Error UX (new) #

New improvements to the UX include:

terminal closed by default, but you can optionally toggle this open
there is a copy to clipboard button to easily share the error log from the terminal
more user-friendly and context aware helper message is displayed
a link to the general documentation for reference
a ‘Proceed without syncing’ button take the user to the compute platform without making any changes
in the case of merge errors an extra Backup and resync button option is presented

Give us feedback! Click here to provide feedback that will help us make this more impactful.

Learn more #

Acknowledgements #

UC Berkeley and the CloudBank Classroom project
CAL ICOR for co-funding
Eric Van Dusen and Sean Morris for championing this work
Balaji Alwar for providing the data and sharing feedback
Nicolas M. Thiéry for feedback on the UX design

Jenny Wong joins the JupyterHub team

Mon, 09 Feb 2026 00:00:00 +0000

We’re excited to share that Jenny Wong has been invited to join the JupyterHub team as a contributor and maintainer.

Jenny’s contributions to nbgitpuller and grafana-dashboards, along with her active participation in project meetings and community planning, earned her this recognition from the JupyterHub community. Being invited to a project team means that existing team members have recognized a pattern of high-quality contributions and trust in a person’s ability to steward the project.

We’re particularly excited about this because our mission isn’t just about deploying infrastructure - it’s about being good citizens in the open source communities we depend on. We invest in upstream contributions, participate in community governance, and aim to build the kind of relationships that strengthen the whole ecosystem. When our team members are welcomed into upstream project teams, it’s a signal that we’re doing this well.

Learn more #

JupyterHub team compass - how the team is structured and how new members are added
JupyterHub proposal to add Jenny to the team
nbgitpuller and grafana-dashboards - projects Jenny contributes to

Acknowledgements #

The JupyterHub community for fostering an open and welcoming project culture
Support from our member communities gives us the capacity to invest in upstream open source engagement and build relationships like this

Introducing Jupyter Book 2 at FOSDEM 2026

Fri, 06 Feb 2026 00:00:00 +0000

Our teammate Angus Hollands gave a talk at FOSDEM 2026: Introducing Jupyter Book 2.

The talk shares why the Jupyter Book 2 and MyST stack was rebuilt, and how it supports open, reusable computational publishing workflows.

Slide from the FOSDEM 2026 session introducing Jupyter Book 2.

Learn more #

Acknowledgements #

Thanks to Project Pythia and to BIDS for supporting deeper foundational contributions like this in Jupyter Book.
Thanks to the Jupyter Book team for helping out with slide creation and doing all the work Angus spoke about here!

New Jupyter Book / MyST stack release (Jan 2026)

Thu, 29 Jan 2026 00:00:00 +0000

The MyST/Jupyter Book stack shipped new releases this week. Release pages:

Where we contributed #

We’ve spent extra time lately trying to fix bugs and generally improve stability, reliability, and UX papercuts. Here are some of the things we focused on:

Extra review time to get in concurrent executions in mystmd
Fixing broken edit URL logic so our community “edit buttons” worked again
Standardized link styles so users know what to expect from links
Added extra hover metadata for github issues and PRs for communities that often link to their GitHub issues
Made all of these releases and wrote up the release notes!

Most of our contributions were foundational in nature - we fixed a bunch of bugs, did review on the PRs of others, and managed the release process itself. Check out the changelogs for more details!

Acknowledgements #

Thanks to our member communities - their memberships cover the cost of foundational upstream contributions to projects like these.

Particular thanks to Project Pythia which currently supports much of our upstream contributions in Jupyter Book.

These releases also support training communities that author curricula in MyST, including NASA ScienceCore.

STRUDEL enables rapid scientific GUI prototyping in partnership with 2i2c

Tue, 27 Jan 2026 00:00:00 +0000

What happened #

The STRUDEL team hosted an all-day workshop with over thirty participants prototyping web applications using the STRUDEL Design System and AI assistants in a custom hub environment designed and managed by 2i2c. By the end of the day, all of the participants had a working prototype that incorporated their own data (or dummy data) into complex flows facilitated by the STRUDEL Design System.

After a brief introduction to STRUDEL, participants were guided on setting up their personal coding environments using the STRUDEL Hub that 2i2c managed. The hub was configured to launch a unique code repository for each participant that was set up pre-workshop on the strudel-workshops GitHub organization.

“Having a startup environment was very nice, as often getting a good development environment set up is half the battle for smaller projects.”

The hub used VS Code for the Web, pre-configured with the Cline AI assistant extension. Participants configured Cline with a shared API key generated by the STRUDEL team via OpenRouter. OpenRouter enabled the team to load credits into a shared account and API key that, in turn, enabled participants to use premium models inside of Cline.

“Super easy to set up Cline in the VM, I appreciated that”

The day was split up into four sprints during which participants worked on different parts of their user interface application, with the majority of the participants working entirely in the 2i2c environment.

“I just want to express my gratitude for such an awesome day today. The workshop was really well structured and facilitated, and I learned a lot. Thank you so much for letting me come!”

Why we’re excited about this #

We think it’s a great example of setting up a complex environment once, and then providing rapid access to these environments via a centralized hub.

This setup accelerated prototyping by removing the burden of setting up a development environment. The setup enabled participants, many of whom had never coded a web application or used an AI coding assistant before, to work seamlessly towards the goals of their design and development projects. The work they produced may continue beyond the workshop and have an impact on scientific discovery and operations.

This setup is a valuable mechanism for encouraging people to build within a pre-existing design system. Being able to launch repositories that are preconfigured with design system tools and templates is a powerful way to promote the adoption of a design system and its embedded patterns and best practices.

It’s also an interesting example of non-Jupyter interfaces orchestrated on a JupyterHub. The combination of VS Code for the Web, Cline, and OpenRouter represent a stack that can be easily transferable to other similar workshops. OpenRouter enabled the workshop team to manage the costs of AI usage entirely themselves.

In all, the participants and instructors were allowed to focus on their work instead of managing and setting up their infrastructure.

Links to learn more #

Learn more about the workshop on the STRUDEL website: strudel.science/engage/news/10-23-2025-building-scientific-uis-with-strudel-and-ai-assistants/
Explore STRUDEL + AI assistant tips and tricks: strudel-science/strudel-kit/blob/main/docs/docs/usage-with-ai.md

Acknowledgements #

We would like to thank all workshop participants. STRUDEL is an open source project housed at the Berkeley Institute for Data Science (BIDS) at the University of California, Berkeley. The STRUDEL team includes members of the Lawrence Berkeley National Lab Scientific Data (SciData) Division UX team, Superbloom Design, The Carpentries, and 2i2c. The project is generously funded by the Alfred P. Sloan Foundation, Liz Vu & Josh Greenberg Program Officers, grants G-2022-19360, G-2023-21098, and G-2024-22557.

April joins the Jupyter Community Building Working Group

Fri, 23 Jan 2026 00:00:00 +0000

We’re pleased to share that our People Operations Lead, April Johnson, has joined the Jupyter Community Building Working Group!

This kind of work reflects how we think about foundational contributions and upstream support: strengthening the social infrastructure that helps open source communities grow and thrive, not just contributing code and running infrastructure. April brings deep experience building teams and communities, and we’re proud to support her upstream efforts in this way.

Acknowledgements #

Thanks to the Jupyter Community Building Working Group for their leadership in building a stronger Jupyter community, and for welcoming April

NASA Open Science ScienceCore tutorial available at github.com/sciencecore

Thu, 18 Dec 2025 00:00:00 +0000

As part of the NASA Open Science initiative, we co-developed a ScienceCore curriculum module with MetaDocencia that teaches researchers and educators how to use NASA Earthdata in the cloud to analyze climate risk.

You can find the material at github.com/sciencecore:

climaterisk: the Determining Climate Risks with NASA Earthdata Cloud module ( source repo).
scipy-2024-climaterisk and pydata-nyc-2024-climaterisk: conference tutorial versions of the climaterisk material.

We also run the shared sciencecore.opensci.2i2c.cloud JupyterHub and an opensci BinderHub that host the hands-on exercises. These deployments use JupyterHub and BinderHub for hosting live computational infrastructure, and Jupyter Book for hosting the reading materials.

Acknowledgements #

This work was supported by a NASA TOPS-T award that funded the ScienceCore climaterisk module and shared infrastructure for the broader ScienceCore community.

Thanks to MetaDocencia for leading translation and localization of the material into Spanish, and helping with content creation.

Thanks to Dhavide Arulia for leading much of the content creation.

Faster reporting of user home directory sizes

Tue, 09 Dec 2025 00:00:00 +0000

Storage quotas help users avoid running out of space unexpectedly and give administrators visibility into capacity planning. However, storage usage can change rapidly, and it’s important to have quick information so that administrators know whether they are close to hitting limits.

We’ve improved how quickly hub administrators can see user home directory sizes across our JupyterHubs. This makes monitoring more responsive and adds quota limit visibility that wasn’t possible before.

Using `jupyterhub-home-nfs` for near-instant disk usage metrics #

Our existing storage monitoring tool, prometheus-dirsize-exporter, deliberately runs slowly to avoid excessive disk I/O. This meant home directory metrics could be hours out of date on systems with many users or large directories. Plus, there was no way to report user quota limits at all.

Our home directory storage is managed by jupyterhub-home-nfs, which enforces per-user quotas. It could also expose usage and limit information as Prometheus metrics using data from the underlying filesystem quota system. Because this information is already tracked by the filesystem, it’s available immediately without scanning individual files.

We made two key improvements:

Make disk usage reporting almost instantaneous. We made jupyterhub-home-nfs export total_size_bytes and hard_limit_bytes metrics to Prometheus for near-instant reporting. We used the same metric names and namespace as prometheus-dirsize-exporter for compatibility. See 2i2c-org/jupyterhub-home-nfs#76
Allow this to be used upstream in JupyterHub Grafana Dashboards so that it can support both types of disk usage reporting. This means users of the upstream JupyterHub Grafana dashboards get the same useful view about home directory usage, regardless of whether the metric comes from prometheus-dirsize-exporter or jupyterhub-home-nfs. See 2i2c-org/prometheus-dirsize-exporter#29

These changes were deployed across all our communities, so administrators can now access current home directory information within minutes regardless of directory size.

Home Directory Usage dashboard showing total size metrics from jupyterhub-home-nfs and other data from prometheus-dirsize-exporter

Try it out #

2i2c member organizations can try this out now. If you have access to your hub’s Grafana instance, you can see these new metrics in the Home Directory Usage dashboard:

Open your hub’s Grafana dashboard.
Go to Dashboards -> JupyterHub Default Dashboards -> Home Directory Usage.
Check the table for up-to-date total size and quota limit values.

For more details, see our docs on filesystem and disk dashboards.

Coming next #

We’d like to build on this work to enable alerting when individual users near their disk quotas. This will make it easier to more reliably track user disk usage across a community. See this issue for tracking: 2i2c-org/infrastructure#7166

Acknowledgements #

This was a directed contribution supported by NASA VEDA to enable more proactive monitoring and alerting for hub administrators.

Supporting NASA Openscapes Champions with Cloud Infrastructure

Sun, 30 Nov 2025 00:00:00 +0000

Openscapes ran a NASA Champions program in November, bringing 30 participants together to learn about NASA Earthdata and the earthaccess Python library. We provided JupyterHub infrastructure for hands-on breakout sessions - a good example of using shared infrastructure to facilitate learning and collaboration in remote events.

They used their JupyterHub for co-working, where participants practiced streaming techniques for accessing cloud data without downloading. Multiple NASA Data Centers (NSIDC, ORNL, ASDC, PO.DAAC) collaborated to co-teach using the shared environment, succeeding despite the event happening the day after a government shutdown.

They also used this to grow the OpenScapes community by getting attendees to join their slack and sign up for their December Earth Access hack day. It’s a great example of leveraging shared community infrastructure to help newcomers learn quickly and join a science community.

Read their full event summary to learn how they structured the program and engaged their community.

Adding User Group Insights to Cloud Cost Dashboards with Grafana

Mon, 24 Nov 2025 00:00:00 +0000

We are excited to announce that we have extended our cloud cost dashboards to support display costs filtered by user groups using Grafana! This new feature allows administrators to monitor and manage cloud expenses based on user group memberships in JupyterHub.

Available for dedicated AWS clusters only (and excluding CloudBank managed accounts). Other deployments on GCP will be supported in the future.

Learn more #

Take a look at the Community Hub Guide to see what’s new
Check out the documentation of the 2i2c-org/jupyterhub-cost-monitoring project to see how it all works
Jenny recently presented her work on the cost monitoring system at JupyterCon 2025 earlier this month. Watch a video or look at the slides.

Give us feedback! Click here to provide feedback that will help us make this more impactful.

Acknowledgements #

Tarashish @ Development Seed for collaborating on this project with us.
NASA VEDA and the DSE Team at NASA MSFC ODSI for funding much of this work.
Kyle Lesinger from the NASA MSFC Office of Data Science and Informatics for providing valuable feedback and bug reports during development.

2i2c at JupyterCon 2025: Helping communities navigate the Interactive Computing ecosystem

Thu, 20 Nov 2025 00:00:00 +0000

This year several team members attended JupyterCon 2025 to show off our own work and the upstream work that we’ve been doing in open source. JupyterCon recently shared the videos of all talks, so here’s a quick run-down of 2i2c’s contributions and where you can watch more.

Building computational narratives with Jupyter Book #

Introducing Jupyter Book 2: Next-generation Tools for Creating Computational Narratives - Chris Holdgraf and Rowan Cockett ( Curvenote) introduce the next generation of Jupyter Book, built on modern tooling and designed for creating rich computational narratives.

Tutorial: Build-a-Jupyter Book With the Turing Way - Angus Hollands co-led this hands-on tutorial teaching participants how to create their own Jupyter Books using examples from the Turing Way.

JupyterHub’s evolution and sustainable operations #

Not Just for Notebooks: JupyterHub in 2025 - Yuvi Panda explores how JupyterHub is evolving beyond just notebooks to support a wider range of interactive computing workflows.

Cloudy With a Chance of Savings: Per-User Usage and Cost Monitoring for JupyterHubs in the Cloud - Jenny Wong presents our recent work improving tools and approaches for monitoring per-user cloud costs in JupyterHub deployments, helping communities operate more sustainably.

Lightning Talk: Controlling Home Directory Costs (with User Empathy) on the Cloud - Yuvi Panda shares practical strategies for managing home directory storage costs while keeping user experience in mind, using jupyterhub-home-nfs.

Finally, there were also several talks that weren’t by 2i2c team members, but were partially enabled by 2i2c’s collaboration. We’re particularly proud of these, because it’s an example of us bringing others into the ecosystem and empowering them to contribute.

Understanding the JupyterHub community #

Findings from the Voices of JupyterHub report - This community strategy talk shares insights from conversations with JupyterHub users and operators about their needs and challenges. It’s not given by a 2i2c team member, but many of us have been involved in guiding (and being participants in!) this project.

MyST-ifying Project Pythia - Julia Kent of NSF NCAR discusses Project Pythia, an open-source platform dedicated to educating geoscientists on Python for complex Earth data analysis. Learn how Project Pythia manages its expansive repository of “cookbooks” and educational content, detailing their strategic shift to MyST Markdown and Jupyter Book 2 to drastically improve project sustainability and reduce maintenance overhead.

How CryoCloud built a healthy open science community #

Building a successful open science community in the cloud - Tasha Snow shares key insights from running the CryoCloud JupyterHub, emphasizing that a successful scientific community relies on both technology and social innovation. She shares data-driven results on how shared JupyterHubs can significantly reduce research computing costs and accelerate scientific iteration. She also explores the critical balance between platform capabilities and the need for social infrastructure to overcome technical barriers and foster true collaboration.

Yuvi on scaling maintainer intuition to facilitate PR review with PR triage boards

Wed, 19 Nov 2025 00:00:00 +0000

Yuvi has a recent post on the Jupyter blog on how his “maintainer intuition” about reviewable pull requests grew into the open-source pr-triage-board-bot, a reusable workflow that keeps GitHub Project boards curated for the JupyterHub, JupyterLab, and GeoJupyter communities. Foundational contributions are rellay important to 2i2c. This is a great example of building clever technical systems that help maintainers prioritize the social work of facilitating contributions to keep ecosystems healthy.

The JupyterHub PR Triage board.

Our favorite quote shares the vibe and cultural principles that drive this effort:

If the author of the PR is a newish contributor, I want to encourage them to stick around by being responsive to their gift. All PRs are gifts that we may or may not choose to accept, but should do so with grace.
Yuvi

Yuvi frames the problem here:

Reviewing PRs is a critical way that maintainers keep an open source project moving forward, but identifying PRs that can productively be merged is hard.

And notes that human scalability is often a big bottleneck:

One key bottleneck we identified in the process was Step 2. In particular, I was relying on my maintainer intuition to pick a single PR that I believe can be merged, so others in the team can do review work. I started exploring what this intuition is, and if it can be scaled.

Here’s his list of “intuitions” that he uses to choose PRs to work on:

PRs that aren’t too big, and are a reasonable size that can be merged within a 2 week window

CI tests passing, so at least our automated checks haven’t caught any issues with it Features or bug fixes that I believe add value to the project and move us in the right direction towards being able to support our users as they need (this is the hardest!)

If the author of the PR is a newish contributor, as I want to encourage them to stick around by being responsive to their gift. All PRs are gifts that we may or may not choose to accept, but should do so with grace.

How long ago the PR was opened. There is such a big difference between a response to your PR 2 days after you make it vs 2 months vs 2 years. I prioritized newer PRs.

What kind of contribution is it primarily? Different engineers on our team have different skillsets (JS, Python, etc) and I wanted to match the PR to what the engineer preferred code reviewing.

And the board essentially tries to capture many of these intuitions by signal-boosting them in one place:

we can roughly say ‘Pick a PR that looks good to you from the top of the “First Time Contributor” or “Seasoned Contributor” list’, and that relieves me from being the bottleneck quite a bit.

Read more in the original article: Scaling “Maintainer Intuition” with Pull Request Triage Boards.

Acknowledgements #

Project Jupyter for trusting us to incubate and now donate the bot code to the broader ecosystem.
JupyterHub and GeoJupyter contributors who tested the triage views and fed real maintainer workflows back into the design.
Jason Grout, Raniere Silva, and Matt Fisher for spotting the experiment early and helping it land across multiple orgs.

Creating a re-usable redirect generator for Jupyter Book 1 migrations

Wed, 12 Nov 2025 00:00:00 +0000

When migrating documentation from Jupyter Book 1 to Jupyter Book 2, URL structures change dramatically and break external links. We spent some time createing a re-usable tool to solve this problem across multiple projects.

You can check out the tool below:

jupyter-book/jb1-redirect-generator

It’s designed to be run in a self-contained way by putting the dependencies in script metadata at the top. This means you can run it like this:

uv run https://raw.githubusercontent.com/jupyter-book/jb1-redirect-generator/main/generate_redirects.py

This let’s you generate redirects from JB1 -> JB2 URL structures and dump them in a _build/html folder with your JB2 built pages.

We tested this out by converting the Jupyter Governance docs to Jupyter Book 2 and running it there. You can find a noxfile that runs these commands here:

jupyter/governance/blob/bcdae30efdecbe75bc4751ef1fe1e602fe82ee10/noxfile.py#L25-L37

And its use in a GitHub Workflow here:

jupyter/governance/blob/bcdae30efdecbe75bc4751ef1fe1e602fe82ee10/.github/workflows/deploy.yml#L39-L44

Learn more #

Community learning: Hub config to pass oauth tokens into user environments

Thu, 06 Nov 2025 00:00:00 +0000

One of our favorite things to see: communities learning from and building on each other’s work!

MAAP recently contributed infrastructure configuration inspired by EarthScope’s approach to handling authentication tokens. Both communities need to pass OAuth tokens into user environments so their SDKs can access protected data - and MAAP adapted EarthScope’s pattern to fit their needs.

This is the kind of peer-to-peer knowledge sharing we hope to foster with our open infrastructure model. When infrastructure is open and communities can see each other’s solutions, they can adapt and build on proven approaches rather than starting from scratch.

Learn more #

MAAP’s PR adapting the configuration
EarthScope’s original config
Our infrastructure repository where all community configurations live

Acknowledgments #

MAAP team for adapting and contributing this configuration
EarthScope Consortium for the original implementation

Refactoring Jupyter Book 2 documentation ahead of a major release

Sat, 01 Nov 2025 00:00:00 +0000

Documentation is what turns open source code into products that people actually want to use. We recently spent a few days refactoring the Jupyter Book documentation to prepare for the upcoming Jupyter Book 2 release, and we’re excited about how much clearer the docs have become!

What we did #

We restructured the docs using the Diataxis framework to better organize content by user type and task:

Reorganized into clear topic areas with landing pages for easier navigation
Added missing content like the feature voting table and contributing guides
Created upgrade guidance to help users understand the relationship between Jupyter Book 2 and MyST

This work helps users find what they need faster and gives the project a stronger foundation to build on going forward.

Why we’re excited about it #

Better documentation reduces maintainer burden by helping users answer their own questions, and it makes the project more welcoming and useful to new contributors. We hope this makes Jupyter Book more accessible to everyone and lays a good foundation for the new release!

We’re also excited because so many others helped provide edits and comments!

Learn more #

Acknowledgements #

Thanks to @rlanzafame, @FreekPols, and @bsipocz for their helpful reviews, edits, and feedback on the PR!
Project Pythia, CryoCloud, NASA Open Science / ScienceCore, and the Berkeley educational projects are our primary member communities using MyST and Jupyter Book. Their support covers the cost of these kinds of foundational contributions.

2i2c Supports the Science Platforms Coordination IHDEA Working Group

Thu, 30 Oct 2025 00:00:00 +0000

The Science Platforms Coordination IHDEA working group (which includes our own Jim Colliander) is developing international standard software computing environments for Heliophysics. The working group recently presented their work at two major conferences: ML-Helio in Madrid and DASH/IHDEA in San Antonio.

The DASH/IHDEA 2025 conference brings together the heliophysics community to advance data, analysis, and software standards

When the working group received $2k from NASA SMCE for cloud infrastructure, they were already member organizations of 2i2c. This meant we could quickly stand up a JupyterHub with their Heliophysics-tailored environments for the conferences:

Easy access - Shared password authentication for conference attendees
Persistent storage - Work saved across sessions
Serious compute - Up to 119 GB RAM and 15 CPUs (much more than a typical laptop!)

The team successfully demonstrated how cloud resources can enable computational work that laptops simply can’t handle, and conference attendees responded positively to the presentations.

Why we’re excited about this #

This showcases a key benefit we want to create with 2i2c membership: reducing the accidental complexity of leveraging the cloud. Because the working group was already a member organization, deploying and managing infrastructure for the conferences was straightforward once they secured cloud funding. No lengthy setup, no new contracts - just quick deployment of the tools they needed.

Learn more #

ML-Helio Conference - Machine learning in heliophysics
DASH/IHDEA Conference - Data, analysis, and software in heliophysics

Acknowledgements #

NASA SMCE for providing $2k funding and AWS infrastructure
Shawn Polson for being the community champion leading this effort
The IHDEA working group for their collaborative partnership in advancing Heliophysics research infrastructure

BIDS joins as 2i2c's first premier member organization

Wed, 29 Oct 2025 00:00:00 +0000

We’re thrilled to announce that the Berkeley Institute for Data Science (BIDS) has joined as 2i2c’s first premier member organization! This partnership marks a significant milestone in our sustainability strategy and recognizes a relationship that’s been central to 2i2c’s story from the very beginning.

BIDS Executive Director Kirstie Whitaker and 2i2c Executive Director Chris Holdgraf discuss the partnership at the membership launch event.

What this partnership means #

As our founding premier member, BIDS is financially supporting 2i2c while helping us design our member network services and relationships. Together, we’ll work on:

Co-designing member services - BIDS will provide feedback and guidance as we develop how our member network operates
Technical collaboration - Partnering on JupyterHub development, cloud infrastructure improvements, and other open source projects
Strategic input - Advising on 2i2c’s direction and approach to strengthening open source communities

This gives us a foundation for both technical and social collaboration, and we hope it opens doors to deeper partnerships across the Berkeley community.

Berkeley has long been a leader in open source software development. This partnership lets us share our knowledge and support community development of open source infrastructure across institutions.

Kirstie Whitaker, BIDS Executive Director

Why we’re excited about this #

For open source: BIDS has been a leader in supporting open source and cross-disciplinary open science for many years - helping to shape projects like NumPy, scikit-image, NetworkX, and JupyterHub. Their feedback and partnership will help us improve our impact across the entire ecosystem.

For sustainability: This is the first paying member of our new membership model, which is a key part of our long-term sustainability strategy. It demonstrates that organizations value what we’re building and want to invest in shared open source infrastructure.

For 2i2c: BIDS has been part of our story from the beginning, and this partnership recognizes the continuing influence and support we’ve received from the organization.

Learn more #

Read the full announcements from our partners:

Acknowledgements #

The Berkeley Institute for Data Science and the entire BIDS team
UC Berkeley’s College of Computing, Data Science, and Society

NASA Openscapes mentors run airborne data machine learning workshop with 401 participants from 68 countries

Thu, 23 Oct 2025 00:00:00 +0000

NASA Openscapes mentors recently hosted a workshop attended by 401 participants from 68 countries to learn how to use airborne data and machine learning for environmental research!

They gave participants computational environments on a community hub managed by 2i2c to teach skills in data science like image classification with airborne spectroscopy and accessing data from NASA JPL AVIRIS-NG data from Earthdata cloud.

We’re proud to enable participants from all over the world to easily access standardized compute and NASA Earthdata data in the cloud for a seamless learning experience.

Learn more #

Acknowledgements #

Thank you to the Openscapes team for sharing this post with us, and
- Michele Thornton
- Rupesh Shrestha
for authoring the blog post.
NASA Openscapes mentors for running a great workshop

A thank you to CZI for its impact on 2i2c and Jupyter

Wed, 22 Oct 2025 00:00:00 +0000

As I attend the CZI Open Science 2025 meeting, it’s a good moment to reflect on the many ways CZI has positively impacted both 2i2c and the Jupyter Project.

The funders who support organizations like 2i2c and Jupyter have a difficult task. In the day-to-day work of building open infrastructure, it’s easy to focus on the next challenge or grant, and lose sight of the cumulative, long-term impact of strategic support. This post is an attempt to pause and celebrate that impact.

CZI has played a unique role in the open source and open science ecosystem since its creation. It has taken an approach to funding and coalition-building that has genuinely changed how many think about supporting open source. Their model has driven an incredible amount of impact, and I’m very grateful for our collaboration.

Here are a few ways their support has stood out.

This was collected quickly, so please let us know of other ways we may have missed! And most importantly, we’re only trying to gather a high level view here, so we’re including many efforts where all the work was led by teams and organizations other than 2i2c/Jupyter. The Jupyter Community has many leaders that are part of this effort as well.

Strengthening the foundations of Jupyter #

Through its Essential Open Source Software for Science (EOSS) program, CZI has funded several initiatives in the Jupyter ecosystem that have strengthened the project in foundational ways. Here are a few that stand out:

JupyterHub’s Contributor in Residence: Allowed JupyterHub to explore a new maintenance and community support model and support maintainer growth. Georgiana’s journey from this program to becoming 2i2c’s first engineer shows how CZI’s investment in people creates lasting impact.

Community Strategic Lead: Focused on diversity, equity, and inclusion at a strategic level. This created space for us to rethink how we build our teams to be more accessible and equitable. It helped us create pathways for underrepresented groups to become community leaders and led to a key partnership with The Turing Way.

Real-Time Collaboration: Built the foundation for collaborative notebook editing, which is very useful for remote scientific collaboration. This was complex technical work that involved collaboration with many stakeholders. It laid a foundation that Jupyter continues to build upon, and may facilitate AI-based workflows in unexpected ways.

Jupyter Accessibility: Funded key improvements in accessibility for JupyterLab and the broader Jupyter stack, including WCAG compliance, automated testing, and documentation. This was one of the first times an open source community received significant support for accessibility and internationalization, highlighting CZI’s leadership.

They also enabled a wide variety of contributions throughout the Jupyter ecosystem that can be traced back to the capacity that CZI’s funding provided for core support alongside broader initiatives.

Enabling 2i2c to grow from an idea to an organization #

CZI also played a key role in 2i2c’s birth, growth, direction, and impact. When 2i2c was only an idea, the possibily of initial support from CZI was pivotal in helping us turn it into a reality. Then as the organization took shape and started to grow, this support gave us the strategic capacity to develop key frameworks for healthy open source contribution practices, our value propositions, and ideas around community network funding. It was a stepping stone towards building out our own sustainability model via membership.

Here are a few milestone moments:

Our Seed funding (2020) CZI provided core support to bootstrap 2i2c from its inception, funding organizational capacity rather than just technical deliverables. This was critical, enabling strategic partnerships, community coordination, and 2i2c’s first hires (Georgiana Dolocan, formerly the CZI-funded JupyterHub Contributor in Residence, and Chris Holdgraf, 2i2c’s Executive Director). This was the primary funding that drove our creation and operations for the first three years. We published a comprehensive report and retrospective documenting the impact of this seed funding.

Catalyzing Global Communities (2022) This collaborative grant brought together six organizations (2i2c, The Carpentries, CSCCE, IOI, MetaDocencia, Open Life Science) to provide cloud infrastructure and training for communities in Latin America and Africa. The project emphasized “train the trainers” approaches and community co-leadership. It was a much bigger challenge than we all realized, and the relationships between our organizations grew stronger as a result.

Strategic Support for Sustainability (2024) CZI provided a bridge funding gift to sustain 2i2c’s mission during a critical growth phase. This gave us the runway to refine our service model and explore paths to financial sustainability. Combined with funding from The Navigation Fund, this gave 2i2c approximately two years for strategic planning.

What Makes CZI’s Approach Different? #

Here are a few things that have stood out to me about CZI’s unique approach to funding:

Funding the foundation of open infrastructure: Through its EOSS program, CZI recognized that science was building on open source infrastructure that was often under-supported. Explicitly targeting that foundation demonstrated the need to fund core technology, not just new innovation.

Funding communities, not just code: CZI made efforts to support projects that invested in social infrastructure, recognizing that building open source is a deeply social process that needs social skills and capacity to succeed.

Funding connections in a network: CZI took a network approach, actively building connections between its grantees. It flew us together annually for its open science meeting, made targeted efforts for collaborative grants, and connected grantees to work together.

Actively growing that network: CZI made clear efforts to bring in new participants, particularly from communities in Latin America and Africa, and was thoughtful about respecting the agency and leadership of these communities.

Developing its own expertise and understanding: CZI also builds its own tooling, often in partnership with open source projects. This gives its team empathy for our challenges and a deeper institutional understanding of the open source world, leading to more impactful contributions.

Thank You, CZI #

We’re profoundly grateful to CZI for their support for 2i2c, Jupyter, and the broader open source ecosystem. Their investments have enabled Jupyter to serve millions of scientists more effectively, allowed 2i2c to grow from an idea to the organization it is today, and enabled contributions that benefit the entire open science ecosystem.

Thanks, CZI, for all the work you’ve done!

A helpful contribution to our JupyterHub SSH README from OpenScapes

Tue, 21 Oct 2025 00:00:00 +0000

We love when collaborators contribute back to the tools we maintain! Andy Teucher from OpenScapes recently fixed a documentation issue in jupyter-sshd-proxy that benefits everyone using the tool.

jupyter-sshd-proxy is a tool originally created by Yuvi to help 2i2c communities connect to their JupyterHub instances via SSH. Andy ran into an issue when using it with the VS Code fork that uses the open-remote-ssh extension - it failed unless double quotes were used around the authorization token in the ProxyCommand.

Through experimentation, Andy figured out the fix and contributed it back to the README. Now everyone using this tool will have clearer documentation.

While small, we think this is a nice example of a “Foundational contributions” from a community:

2i2c creates and maintains open source tools to help our communities
Our communities use those tools and run into issues
They debug, figure out solutions, and contribute improvements back
Everyone benefits from the improvements

This is exactly how we want 2i2c to help our communities - by making it easy for them to contribute back to the ecosystem and strengthen the tools everyone relies on.

Acknowledgements #

Thanks to Andy Teucher for the contribution and debugging!
Thanks to OpenScapes for being great collaborators places where can work with people like Andy

Communities learning from one another - Project Pythia and ICESat-2 Hackweeks

Tue, 21 Oct 2025 00:00:00 +0000

We wanted to share a short vignette about two of our communities learning from one another.

At the latest Project Pythia community meeting, Project Pythia met with representatives from ICESat-2 to share learning about notebooks and cookbooks in educational settings.

Anthony Arendt from UW’s eScience Institute shared how they’ve used educational notebooks in their hackweek programs. The discussion explored ways to improve cookbooks, especially for large collections that require different computational environments, sparking ideas about higher-level abstractions for organizing educational content. There is a lot of overlap in the needs and workflows of these communities, and we’re hopeful they can find ways to re-use one another’s ideas, content, and infrastructure.

One of our service goals is to make it easier for our member communities to learn from one another - using standardized tools and infrastructure means we can learn what works, what doesn’t, and collectively improve our workflows more quickly. We’re working on ways to encourage this kind of interaction in our member networks, so we wanted to celebrate this little win.

Learn more about these communities #

Project Pythia - An educational resource for geoscience computing with open-source Python
Project Pythia Cookbooks - Domain-specific example workflows for geoscience
ICESat-2 Hackweeks - Collaborative learning events combining tutorials, peer learning, and team projects

TIL: A few ways to track web traffic for open source projects

Sun, 19 Oct 2025 00:00:00 +0000

Understanding how people discover and navigate your project’s web presence is valuable for open source communities, but there are a lot of options out there and many maintainers may not know about them. Recently Chris did some research to improve the web analytics for Jupyter, and learned about several options for tracking web traffic¹. Here’s a quick report of what stood out.

Three analytics tools we found helpful #

Plausible.io - A privacy-friendly, GDPR-compliant analytics service

Clean interface with public dashboards (see Jupyter’s dashboard)
Paid service but offers 15% discount for open source projects
Cost scales with traffic volume. It can get expensive for a project as big as Jupyter!
This is the service we ultimately ended up using…

ReadTheDocs Analytics - Built-in traffic tracking for documentation sites

Available as a free add-on for ReadTheDocs projects, it provides traffic data specific to documentation pages.
There’s no additional cost if already using ReadTheDocs, though if you’re on a business plan you may need to pay for it.
The analytics are a bit barebones, but quite useful for learning where your readers are navigating.
Enable in Settings > Addons > Analytics.

GitHub Repository Analytics - Native analytics in GitHub.

Shows clones, views, and referring sites. This is also fairly barebones, but it’s really useful to see who is actually looking at your repository.
Free for all GitHub repositories.
Access via Insights > Traffic on any repository.

Learn more #

GitHub issue coordinating Jupyter’s analytics work
Plausible.io public dashboard for jupyter.org (this might be down for now, but we’re working to bring it back up)
ReadTheDocs Analytics documentation
GitHub Traffic Analytics API

Acknowledgements #

Thanks in particular to Jason Grout from the Jupyter Executive Council for collaborating on this investigation and helping test these tools.

Chris has been serving on the Jupyter Executive Council as a Foundational contribution. This was related to that effort! ↩︎

Fixing the mybinder.org usage analytics archive

Tue, 14 Oct 2025 00:00:00 +0000

The analytics archive at archive.analytics.mybinder.org powers the mybinder.org usage dashboards and provides a daily-published dataset that researchers and communities use to understand how Binder is being used across different domains and scientific communities.

While updating our quarterly Binder impact report, we discovered the archive index page had stopped updating. The analytics publisher was writing index files to temporary storage before uploading to Google Cloud Storage, but for some reason the upload step stopped working. We deployed a fix that eliminates the temporary files entirely - the code now generates the HTML index as a string in memory and uploads directly.

The mybinder.org analytics archive shows a list of daily usage reports that anybody can download.

Fortunately, we didn’t lose any data! Thanks to some smart design decisions, the daily analytics files were being collected properly the entire time, only the index page listing them was broken. You can find the full archive here.

Learn more #

Pull request with the fix
mybinder.org usage dashboards
The binder-data/ repository is where we aggregate and publish archive data to be more accessible.
Our quarterly impact report from mybinder.org

Acknowledgements #

Thanks to the JupyterHub community for their collaboration on mybinder.org infrastructure

Impact report from 2i2c's Binder federation instance

Tue, 14 Oct 2025 00:00:00 +0000

The mybinder.org service provides reproducible and interactive computational environments for the open science community. It is financially supported by a federation of BinderHubs. 2i2c is part of the team that manages mybinder.org, and we’re committed to supporting this critical infrastructure for open science and reproducibility.

We developed a more cost-efficient process for deploying BinderHub on a single VM and are now running a BinderHub at 2i2c.mybinder.org. This post highlights the impact we’ve had through our support of the mybinder.org federation, we’ll update it periodically.

Usage over time #

Here are weekly launches on mybinder.org. Launches from 2i2c.mybinder.org are in red. Source: mybinder analytics dashboard.

In Q1 of 2025, 2i2c.mybinder.org launched 417,048 reproducible sessions. In this time, mybinder.org was primarily driven by our new Hetzner node as we worked with GESIS to stabilize their own BinderHub instance.

In Q2 of 2025, 2i2c.mybinder.org launched 249,750 reproducible sessions. In this time, we worked with GESIS to deploy their BinderHub instance on the same Hetzner node setup, which let us distribute more of Binder’s load onto them.

In Q3 of 2025, 2i2c.mybinder.org launched 118,083 reproducible sessions. We experienced a typical summer dip in sessions, had to fix the analytics dashboard as well as fix a TCP scanning abuse case that briefly brought down the 2i2c node.

Where we’ve made improvements #

As part of this effort, we’ve made several improvements to the Binder and JupyterHub ecosystem. Here are a few links where you can read more:

Deploying BinderHub on a single VM with k3s - Our approach to making BinderHub deployment cheaper and simpler
Hetzner cloud infrastructure experience - Cost-effective cloud hosting for mybinder.org
Combating TCP scanning abuse on mybinder.org - Developing anti-abuse tools to prevent an abuse use-case.
Improving Binder’s usage dashboard - This helps us create posts like these and is useful to others as well.
Integrating BinderHub with JupyterHub - Working with GESIS to bring Binder’s dynamic image building capabilities to persistent JupyterHubs, empowering users to manage their own environments

Acknowledgements #

This work is made possible by:

GESIS for their continued support as mybinder.org federation members
The JupyterHub community for collaboration and support
Our member communities whose fees support this work

Combating tcp scanning on mybinder.org with the tcpflowkiller

Wed, 08 Oct 2025 00:00:00 +0000

We’ve deployed a new tool to mybinder.org that automatically detects and stops port scanning activity, helping us maintain service reliability while being responsible citizens of the internet.

Port scanning is a common part of network-based exploits, and many server hosts prohibit this activity (including Hetzner, where the 2i2c mybinder.org infrastructure lives). We developed a little tool called tcpflowkiller as part of the cryptnono project (our anti-abuse set of tools for hosted JupyterHub and Binder infrastructure) to automatically kill processes that exhibit port scanning behavior. This reduces the likelihood of triggering our server host’s abuse policies and helps keep mybinder.org running reliably.

Why this matters #

As providers of public compute, it’s our responsibility to make sure people can’t use our infrastructure to abuse others. This is part of being responsible citizens of the internet. It also saves us time in dealing with outages because cloud providers (understandably) block access when they suspect there is abuse.

Hetzner and similar hosts have many benefits (including significant cost savings), and tools like tcpflowkiller help keep hubs and binders running smoothly on such hosts, which have different abuse policies than the big commercial cloud providers.

AWS and other cloud providers have proprietary ways to combat abuse (like AWS GuardDuty). We could have spent our time investing in developing rules there. Instead, contributing to cryptnono helps provide the same set of features in a cloud-agnostic way, in line with our principles of supporting open infrastructure that gives communities control over their infrastructure.

This tool has now been deployed to mybinder.org, and we’ll monitor its effectiveness over time. We may roll this out to 2i2c public BinderHubs in the future based on patterns we observe.

Learn more #

Acknowledgements #

Thanks to GESIS for their continued support of mybinder.org and to Raniere Silva for collaborating on this deployment with us.
More reliable Binder infrastructure is also supported by NASA Open Science / Science Core, whose tutorials run on the opensci binders that depend on this same anti-abuse stack.

TIL: GitHub Action secrets are only available from non-forked repositories

Wed, 08 Oct 2025 00:00:00 +0000

If you’ve worked with GitHub Actions in open source projects, you might encounter a hard-to-debug error where repository secrets are simply empty. That’s probably because the PR is from a forked repository! Here’s a little learning we had after losing a bunch of time figuring this out:

Our PR from a fork was using empty strings for repository secrets #

github-activity is a tool we help maintain for generating changelogs from a wider variety of contributions than GitHub’s defaults. We needed to set a secret to raise the API rate limits for the tool’s tests. These tests pull data from repositories outside of the github-activity repository itself, which quickly hit GitHub’s rate limits without authentication.

The problem seemed straightforward at first: we added the secret, but the workflow was acting as if the secret didn’t exist at all. After updating the code to explicitly check for empty strings, we discovered the authentication token was actually an empty string, even though it had been set.

After some debugging, we uncovered the root cause: the PR was opened from a fork of github-activity. GitHub intentionally makes secrets appear as empty strings when a PR originates from a forked repository. This is a security measure to prevent unauthorized access to sensitive credentials.

A quick fix: re-open the PR from the base repository #

The immediate fix was simple: re-open the PR from a branch in the base repository rather than from the fork. This worked perfectly, but it’s not a sustainable solution for open source projects that rely on community contributions from forks. We don’t want to create a dynamic where maintainers have different PR workflows because they’re operating on the base repository.

How use secrets with forked repositories in a safe-ish way #

If you need to make secrets available to PRs from forks, there are a few approaches we learned about, each with security trade-offs:

The `pull_request_target` workflow #

GitHub provides a pull_request_target workflow that can access secrets even when triggered by forked PRs. In this case, GitHub will always run the test suite on main, instead of any changes your PR introduces.

Why this is dangerous: malicious actors could add code to a PR that exfiltrates your secrets (for example, Python code that prints os.environ["MY_SECRET"]).

As a result, only use secrets that you’re OK with being public. In this case, we generated a read-only token with restricted permissions. However, this is still kinda risky so use at your own peril.

If your repository workflows require a secret that absolutely cannot be public (e.g., a publishing key for a package repository), try a method like the following:

Using GitHub Environments for granular control #

A safer approach is to use GitHub Environments, which let you restrict which secrets are available to specific jobs. This way, you can ensure that only non-critical secrets (like those needed for testing) are available to jobs that run on forked PRs, while keeping sensitive secrets (like PyPI publishing tokens) restricted to trusted contexts.

This is the approach we implemented in github-activity, and it provides a good balance between security and community contribution workflows. We created a separate environment for publishing to PyPI so that its secret is never available to the job that runs with pull_request_target.

We hope this saves you time! #

Hopefully this learning is useful to others who run into the same confusing behavior. We’ve added a few improvements to github-activity to more reliably check for empty strings to surface this kind of condition, but knowing the basic behavior of GitHub environments and forked PRs is even better.

Learn more #

Enabling transparent cloud cost monitoring with user-level dashboards

Tue, 30 Sep 2025 00:00:00 +0000

We are excited to announce that dashboards to monitor cloud usage and costs at a per-user level are now available! See the cost monitoring documentation for more information.

A key goal of 2i2c is to make the cloud safe for science. By providing transparent cost monitoring, we give communities the confidence that they won’t face unexpected bills and can better understand how their usage patterns translate to cloud costs. This visibility is especially valuable in our shared platform model, where each community gets their own independent hub while benefiting from shared infrastructure expertise.

The user-level cost breakdown allows communities to identify individual usage trends and manage their resources more effectively. Communities can now see exactly how their computational work translates to cloud spending, enabling better resource planning and budget management.

Give us feedback! Click here to provide feedback that will help us make this more impactful.

Learn more #

Cost monitoring documentation

Acknowledgements #

Tarashish @ Development Seed for working on this with us.
NASA VEDA for funding much of this work.
Andy @ Openscapes, Alex @ Development Seed and Sarah @ Earthscope for giving us close feedback.

From scattered effort to strategic impact: How we're systematizing our Foundational open source contributions

Fri, 26 Sep 2025 00:00:00 +0000

Over the past year we’ve experimented with being more strategic about supporting upstream communities as a team. This post summarizes our current plan, including team targets and practices we’ll continue to pilot. We’ll revisit this as we learn more.

Note: This document is about the Foundational contributions we make so that open source communities are healthier and more impactful. It is not about Directed upstream contributions we make as part of our own product work. See On being a good open source citizen: supporting a healthy ecosystem through directed and foundational contributions.

The challenge: Why scattered individual efforts aren’t enough #

Healthy open source communities rely on both individual and institutional contributions. 2i2c aims to be an excellent “upstream citizen”, so we need a structured approach with clear goals and rationale for why it’s the best use of our team’s time.

Without a coordinated approach, we risk two problematic outcomes:

Best case: Scattered, individual efforts that are subject to the Tyranny of Structurelessness. We help at the margins but not meaningfully.

Worst case: Our organizational capacity inadvertently dominates communities, making 2i2c the sole stakeholder capable of meaningful development and maintenance. We functionally take over the project.

By setting explicit goals, both our member communities and upstream projects can hold us accountable for actions that strengthen rather than undermine community health.

Our long-term goal: Multi-stakeholder, resilient communities #

With this in mind, we’ve chosen the following outcomes as our major goals for upstream contribution:

We want the Jupyter¹ community to be a multi-stakeholder², diverse³ community with a very high bus factor, because we believe this is a critical pre-requisite for advancing our mission and value proposition.

We want to build team processes that help upstream communities make progress towards this goal, so everyone can equitably participate with the support they need.

Two key objectives #

Starting with JupyterHub, we’ve identified two objectives that will guide our work:

Objective 1: Increase the number of casual but returning contributors to the JupyterHub community

Objective 2: Increase the number of total maintainers in the JupyterHub community

We’ve chosen these objectives because (1) they have impact, (2) we can make meaningful progress on them, and (3) we can integrate this work into our team’s workflow.

For each activity below, we’ve brainstormed some Key Performance Indicators (KPIs) to track progress and ensure we’re learning effectively.

Four pilot activities #

We’ll experiment with these four activities⁴:

Review pull requests from non-maintainers
Issue Triage office hours
Sponsoring and Mentoring new Maintainers
Increase bus factor and diversity of people making releases

Review Pull Requests from non-maintainers #

Imagine two different scenarios:

You casually contribute a PR to some OSS project. Someone responds the next day, you have a pleasant back and forth, and it gets merged (or rejected) within a few days.
You casually contribute a PR to some OSS project. Nobody responds for a year. Eventually someone leaves a comment. You have forgotten everything, and don’t even respond. Much later, your PR gets closed as stale.

Which experience will encourage you to come back and contribute again?

It’s clearly (1). We should use our institutional capacity to bring the community closer to (1).

We’ll accomplish this by including the following work item in every sprint:

Review of N PRs by non-maintainers of JupyterHub

We will build skills (via pairing, training, etc) inside 2i2c, as not everyone will feel comfortable reviewing pull requests for all projects, nor have rights to merge or close PRs. We may also do additional work like new contributor drives, better documentation, and policy advocacy. We will include pull requests of all types, not just code contributions.

KPIs #

We imagine two KPIs for this activity:

Number of PRs merged (or closed) through our sprint planning activity.
Number of returning contributors whose PRs were reviewed by us.

Issue Triage office hours #

Issue Triage involves combing through an upstream repository’s issue tracker, engaging with new issues, refining them to be actionable, and signal boosting important ones for team action. This is hard for newcomers, as it often requires deep knowledge of various components to understand how to direct an issue or refine it. It’s also challenging for team members still learning open source community dynamics. We’d like to upskill our team members within 2i2c and our upstream open source communities.

As part of our sprints, we will run regular “Issue Triage” office hours. We’ll begin by upskilling our own 2i2c team members in effective issue triaging. We’ll then explore opening issue triage sessions to the broader upstream community.

KPIs #

Number of issues triaged by 2i2c team members.⁵

Sponsoring and Mentoring new Maintainers #

OSS communities must grow their contributors into maintainers, or they will die.

XKCD comic about dependency

Growing new maintainers takes time and effort from both the potential maintainer and existing maintainers who mentor and sponsor them. The focus on sponsorship is important, as laid out by Lara Hogan. This work takes years, not months, to manifest.

We will build structures to identify potential maintainers and create pathways for them to gain maintainership status. As JupyterHub lacks an explicit maintainer pathway, we will build our own process via these focus areas:

Identifying potential candidates for maintainership
Identifying potential community work they can do to help get involved (contributing bug fixes, code reviewing, issue triage, helping answer questions, contributing code / documentation, release management, etc)
Build pathways for candidates to do (2) as appropriate.
Iteratively continue until candidates have done ’enough’ work to gain maintainership status.

This work is nebulous but worthwhile. We will coordinate this effort closely with community leaders, recognizing it takes time to actualize.

In the Jupyter community, maintainership status is tied to individuals, not to organizations they work for. Nobody should get maintainership status simply because they work for a specific organization (such as 2i2c). We should look for diverse candidates, ideally funded by different organizations, who are interested in becoming maintainers.

Note: We’d also like to start with individuals in our collaborator network. For example, we’re using an engagement between NASA VEDA and Development Seed to onboard several team members into these projects.

KPI #

This measurement moves slowly, but is very clearly impactful:

Number of people who have become maintainers due to our concerted efforts.

Increase bus factor and diversity of people making releases #

Making releases is often thankless but important to community health. It involves coordinating testing, writing changelogs, and providing upgrade instructions. Institutions can help by dedicating team time to perform this task regularly. To advance the ‘multi-stakeholder’ and ‘high bus factor’ aspects of our goal, we will have many different people do releases, via mentorship and sponsorship. This will integrate into our regular workstreams.

KPIs #

Number of releases performed by 2i2c engineers
Number of releases performed by others with sponsorship / mentorship from 2i2c engineers

Criteria for upstream projects to support #

Our long-term goal applies to upstream communities that:

We strategically depend on to serve our member communities as part of our community hub service
We need to help sustain, given upstream community dynamics
We have the ability to help sustain

For example, Kubernetes satisfies (1) but not (2) or (3), while JupyterLab meets (1) and (2) but not (3) (presently). Currently this policy only applies to JupyterHub, but may change as our organization evolves.

How we’ll implement this #

Who is responsible #

Implementation is the responsibility of 2i2c’s Product & Services team. These activities must integrate into the team’s daily practices, not become an external shadow process for some members.

How we’ll fund this work #

Foundational upstream support requires significant work and expertise. We plan to fund this through:

Fees from our member communities. A percentage of our membership fees includes covering the cost of Foundational contributions like this.
Targeted contributions from some of our collaborators. Some collaborators have funds and want to support open source at a foundational level, in some cases we use funds from these collaborators to cover our costs.

We still need to explore what these efforts cost and mechanisms to recover those costs.

Next step: Learning in public #

We’re excited to experiment with more effective upstream contribution and eager to learn. We’ll share our experiences so others can learn from and comment on our process.

Acknowledgements #

@MinRK and - @bsipocz for helping review a draft of this!
@choldgraf for feedback, guidance, and editing for this post and the team practices in it.
JupyterHub, JupyterBook, and Project Jupyter for teaching us a lot about open source over the years.

Currently this is particularly JupyterHub and Jupyter-wide leadership. We’re exploring how to incorporate JupyterBook into our service and are thus investing Foundation contributions there as well. ↩︎
With different kinds and sizes of organizations (companies, non-profits, universities, etc) and individuals being stakeholders. We want to avoid a single organization monopolizing power within any community. ↩︎
Across the power spectrum - from users to bug reporters to casual contributors to maintainers to people on governance duty ↩︎
Implementation note: We will not start doing all these immediately! We will consult with the rest of the team, and start these 1 at a time so we can build these processes sustainably and equitably. ↩︎
This requires a definition of “an issue that has been triaged”, and to our knowledge no such definition exists. We’d like to learn how to measure something abstract like “issue triage” - perhaps it is something specific putting it on a board for further action or applying a label, or something more abstract like “increasing how clear and actionable the issue is”. We’ll explore this when we start to make progress towards this objective. ↩︎

Updates from Chris' position on the Jupyter Executive Council and Foundation Board

Tue, 23 Sep 2025 00:00:00 +0000

This is a running blog post for Chris to share out community updates from his time on the Jupyter Executive Council. See context for this page.

Updates for the Jupyter community from Chris #

Chris aims to write monthly updates for Jupyter’s community in the Jupyter Community Forum to share his perspective on what the council is up to. We’ll update this index post as new posts are available.

Blog posts about Chris’ experience on the executive council #

Chris blogs now and then to share his thoughts, important announcements, and provide major updates from his perspective on the executive council. Here are the posts that have come out of this effort.

Your Ideas, Our Support: Jupyter Community Call For Funding Proposals
Why open source foundations try to fund systems, not development - May 31, 2025
Jupyter can align the needs of its community and its foundation by enabling contribution - Mar 22, 2025
The relationship between the Jupyter Executive Council, Software Steering Council, and Foundation - Mar 02, 2025
Ways the Jupyter Foundation could support open source projects - Feb 26, 2025
Running for the Jupyter Executive Council - Jan 14, 2025

2i2c’s commitment to supporting open source communities #

In early 2025, Chris was elected to the Jupyter Executive Council. We’re tracking these efforts because providing leadership and community support to open source projects is a key Foundational contribution we wish to make for open source projects. 2i2c makes time for team members to contribute upstream in strategic ways like this as part of our commitment to open practices. However, this kind of work is very difficult to track! This blog post is an attempt at putting these efforts into one place for us to track and for others to discover.

Acknowledgements #

2i2c’s member communities support Foundation open source contributions like this with their membership fees.
The Navigation Fund provides strategic funding support for 2i2c which covers major open source leadership contributions like this.

Incident report: UC Merced user throttling during class startup

Tue, 16 Sep 2025 00:00:00 +0000

On August 29, 2025 our cloud infrastructure team experienced an incident with the UC Merced community hub when students tried to login simultaneously at the start of class. For more detailed technical information about this incident, see our full incident report.

What happened #

Students experienced issues when trying to login to the hub at the same time during the start of class.
The concurrent spawn limit was reached quickly due to the large number of users starting up simultaneously.
New nodes had to be brought up by the autoscaler, which took roughly 10 minutes from start to end.
Users who tried again after 1 minute weren’t guaranteed to get their servers started immediately since new nodes were still spinning up.
This was an “expected” scale-up event but the lack of clear messaging caused users to interpret it as instability.

What we learned #

We need better communication so users understand when infrastructure slowness is “expected” vs. “unstable”.
We need better alerting for concurrent user startup throttling - we found out about this issue from users rather than automated monitoring.
We learned that JupyterHub’s metrics don’t properly expose 429 status codes in our dashboards.
This will happen again if we don’t have proper scaling limits and node provisioning strategies for sudden user influxes.

Resolution #

We implemented several fixes:

Increased the concurrent spawn limit from 64 to 100.
Put UC Merced users on larger nodes to reduce the number of node spinups needed. this will cost more in cloud but result in fewer scale-up events.
Created action items to improve logging, alerting, and monitoring for similar incidents

Acknowledgements #

Thanks to UC Merced students and instructors for reporting the issue through our support system.

We're going to try blogging about our work more often

Wed, 10 Sep 2025 00:00:00 +0000

At 2i2c, we aim to be an exemplar organization in working openly, supporting open science, and supporting open source communities in everything that we do. We believe that open science is a process, not a product, and commit to following practices that align with open principles throughout our work.

A key principle for our team has always been working in the open. We do almost all of our work in our own public repositories or in upstream community spaces. However, still struggle to communicate what we’ve been up to. We can’t expect everybody to keep an eye on the spaces where 2i2c operates in order to learn whether we are actually living up to our goals.

So, we’re going to try an expriment that is inspired by Simon Willison’s blog. In short, here’s what we aim to do:

Use our blog any time we learn or do something that’s aligned with our value proposition.
Don’t hesitate to share in-progress work and things that are still rough - this is part of working in the open.
Occasionally, share posts that help readers navigate this stream of short impact posts. We’ll use our Mailing List for this, so click that link if you’d like to sign up.

Our hope is that this will give us a steady stream of posts that show what we’ve been up to, as well as a way to navigate those posts via our mailing lists for the folks that don’t want to spend all day combing through our blog. :-)

Acknowledgments #

Support for organizational and strategic work like this is provided by a grant from The Navigation Fund.

On being a good open source citizen: supporting a healthy ecosystem through directed and foundational contributions

Wed, 03 Sep 2025 00:00:00 +0000

Any organization building on open source faces a fundamental tension: how do you serve the needs of your organizational stakeholders while also acting as a responsible steward of the upstream projects you depend on? This is harder than it looks - simply “making PRs” leaves a number of open source needs unaddressed, and can burn out both your team members and the open source maintainers. We think about this a lot at 2i2c, and want to share our framework to navigate this challenge intentionally.

Here are a few questions we’ve been grappling with:

How do we tie general upstream maintenance to value delivered to our user communities?
How can we scope upstream support so that it doesn’t detract from our service needs and product strategy?
How can we encourage team members to work on the most impactful aspects of upstream support?
How can we intentionally and equitably support open source communities as a team, rather than a collection of individuals?

Along the way, we realized there are two very different kinds of upstream contributions:

Directed Contributions: A contribution driven by the needs of our member communities and product roadmap. We call these “Directed” contributions because they address a targeted need driven by one stakeholder (us!).
Foundational contributions: A contribution driven by the needs of the upstream community. We call these “Foundational contributions” because they’re meant to provide the healthy foundation on which a community can operate and grow.

Historically we have conflated these types of contributions, but we think it’s key that we treat them differently.

Note: For a more practical guide that describes the systems we’ve set up to accomplish Foundational upstream contributions, see From scattered effort to strategic impact: How we’re systematizing our Foundational open source contributions.

Everybody has an open source hat and a stakeholder hat #

Open source teams¹ are usually two kinds of teams that overlap heavily:

A collection of stakeholders working together on the open source project, each with their own goals and interests.
An open source team with a shared goal and strategy for the open source project.

In this case, stakeholders can be individuals or companies. They use and contribute to the open source project because it advances their own interests. For example, an enthusiast contributing to a project because it brings them joy, or a company contributing to a project because they build a product that depends on the open source technology.

However, for open source projects to be successful they also need their own unique identity, goals, strategy for impact, and system of work. This allows a diverse collection of stakeholders to work together effectively and create impactful technology. This team is made up of the same stakeholders described above, but with a responsibility to lead and support the open source team, rather than just serve their individual interests as stakeholders.

Thus, any open source stakeholder has two hats: they are both representatives of a stakeholder and members of an open source team. While it’s possible to align the interests of these two groups, we think it’s still important to distinguish between them.

Directed Contributions benefit the stakeholder you represent #

A Directed Contribution is primarily driven by the needs of a stakeholder in an open source project. To use 2i2c as an example, let’s take a quote from 2i2c’s value proposition:

2i2c serves a global network of community hubs for interactive learning and discovery

Community here does not refer to open source upstream software provider communities (like JupyterHub or Kubernetes), but instead to downstream user communities (like CryoCloud, Openscapes, or NASA VEDA).

When 2i2c makes a Directed Contribution, it means we are trying to deliver value to one or more of our member communities by making an upstream contribution.

Satisfying community needs often involves directly working on the software they use. Driven by our right to replicate principles, this means we mostly work on software that is not proprietary to 2i2c nor solely owned by us permanently - but by contributing to an upstream software community. These are all Directed Contributions.

Some illustrative examples:

Allow login to be gated on OAuth2 granted scopes was a feature we added to support one of our communities’ auth flow ( EarthScope)
Changing how .pyc files are kept in images was work we did as a result of a support ticket investigating spawn timeout issues in the LEAP hub.
Adding landing pages functionality to Jupyter Book and MyST was work we did to support member communities like CryoCloud and Project Pythia.

The fact that these are open source contributions is incidental. We are primarily doing this work to deliver value to our community network.

We plan Directed Contributions according to our roadmap and member feedback #

Directed Contributions naturally align with 2i2c’s overall goals and strategy, so we use our product processes for planning and delivering on them. However, we also want to provide transparency to upstream communities so that they understand who is driving the contributions that we’re making.

With that in mind, here are a few ways that Directed Contributions relate to our practices:

Directed Contributions should be defined by our product roadmap and prioritization processes.
We allocate engineering time for these upstream contributions as part of our product lifecycle, including the extra coordination and communication work needed to work at the pace of the upstream community.
We cross-link 2i2c product initiatives to upstream issues and pull-requests wherever we can to provide transparency about why we’re making a contribution.
We communicate this work via our blog so that 2i2c’s member communities know about the contributions we’ve made on their behalf.

Foundational Contributions support a healthy open source community #

However, contributions can’t always be driven by a stakeholder’s needs or the open source team will not have an identity or support structure of its own. Here’s another excerpt from our value proposition:

We need infrastructure services that are driven by community needs and values, that follow the same open source science practices we wish to see in others, and that believe in the power of shared community resources and knowledge.

Being a “healthy upstream citizen” is core to 2i2c’s mission, and is also a way to help communities we rely on remain healthy. Some of our contributions should be Foundational rather than Directed. This means doing things that keep the overall ecosystem healthy even if it does not directly address a specific member community need. The presence of a healthy open source ecosystem is a value to our member communities in-and-of itself.

Defining “Foundational” needs is difficult, because open source teams tend to have less structure and formally-stated goals and needs than most organizations. In 2i2c’s case, we focus our Foundational Contributions around maintaining the health of the open source ecosystem.

It includes things like:

Grow and guide new contributors to grow team capacity
Help making releases
Provide code review
Fix broken CI
Write documentation and tutorials
Manage and run meetings
Align open source teams on goals and strategy

However, the real point is that these actions need to be driven by the upstream project’s goals and needs, not by 2i2c’s needs.

Here are a few common examples of contributions that are not considered Foundational for our team:

Opening a PR to add a major feature to an upstream project.
Creating a brand new project in an open source organization in order to scratch your own itch.
Engaging in reactive open-source work that isn’t driven by a clear strategy or goal (e.g., randomly responding to the last few GitHub issue comments you happened to notice)

We plan Foundational Contributions alongside our engineering roadmap #

Foundational Contributions are important to 2i2c both for strategic and tactical reasons. However, when left as unstructured time (as we have historically), it runs into all the problems of unstructured work - it happens in non-strategic ways, it isn’t evenly balanced across team members, it is more or less accessible depending on your personal comfort level and skills, etc.

With that in mind, here are a few ways that Foundational Contributions relate to our practices:

We need to own Foundational Contributions as a team, rather than asking individuals to identify and do this work on their own.
We need to define team goals and strategy to define the impact we want to have, and what kind of work leads to that impact.
We need a team system for identifying and prioritizing the most impactful Foundational Contributions to perform.
This system must spread the responsibility of Foundational Contributions across our whole product team.
It means we need to give people support and training to do this effectively. For example, helping team members grow into roles that involve upstream work, rotating certain types of contributions across team members, etc.

To ensure this work is intentional and equitable across our team, we encourage Foundational contributions to happen within this framework. Contributions that falls outside of it is treated as a valued, but separate, personal contribution.

What’s next #

By distinguishing between Directed and Foundational contributions, we can align and balance our immediate product needs with our long-term commitment to community health. We believe this framework allows organizations like ours to be better partners. We’d love feedback about this process, how we can improve it, and what others have learned along the way.

By “open source” we are focusing on multi-stakeholder open source projects with participatory and inclusive leadership and contributions. This wouldn’t apply to an organization- or person-specific open source project. ↩︎

Sharing JupyterHub's vision for more flexible application deployment at the doepy talk series.

Wed, 03 Sep 2025 00:00:00 +0000

Our Technical Lead Yuvi Panda recently gave a talk at the doepy meetup about JupyterHub’s interest in moving beyond the “single-user notebook application” and towards a more flexible approach to enabling administrators to deploy many different types of applications and environments.

Check out a video of the talk here:

This is an important step for the JupyterHub project in order to support the many different kinds of workflows that data scientists need to use in their work. We hope that this generates more interest in the JupyterHub project and gives us useful feedback to guide the team’s understanding of this direction.

Learn more #

Acknowledgements #

The doepy team for inviting Yuvi to give this talk.
The JupyterHub team for working with us on this strategy.
2i2c’s network of member communities whose fees support our Foundational open source engagement.

Overhauling repo2docker's documentation

Fri, 01 Aug 2025 00:00:00 +0000

Documentation is incredibly important for open source projects to communicate their value and show users how to make the most of their tools. However, it’s one of those things that often gets de-prioritized with all of the other work that needs to happen in a project.

We are heavy users of the repo2docker project in 2i2c’s service. It allows you to dynamically build an environment image that can by hosted in a cloud service like JupyterHub. It’s the underlying tool used by BinderHub, and is the focus of recent work on enabling dynamic image building in a JupyterHub (more on that to come).

As part of this work, we decided to do a small overhaul of repo2docker’s documentation, with the goal of making it easier to discover, navigate, and learn from. Here’s how the landing page looks now!

The new repo2docker landing page now looks more like an actual landing page!

We hope this makes repo2docker a more useful tool for everybody, and also gives us more confidence pointing our communities to the repo2docker documentation in their community workflows.

Acknowledgements #

Thanks to the NASA VEDA project and to NASA Open Science for providing funding and collaboration for this work.
Thanks to the JupyterHub community for collaboration and review of this work.

2i2c's submissions to JupyterCon 2025

Sat, 19 Jul 2025 00:00:00 +0000

Update: The talks are now live! See this blog post with links to our videos.

We were excited to hear that JupyterCon is happening again in 2025. The Call for Proposals just wrapped up, and our team was involved in preparing and submitting several directly from 2i2c as well as from the ecosystem in general.

Enjoy brief summaries of the proposals we contributed to below. Let us know if you have ideas for future talks you want to hear from us?

Note: Many of these submissions were written in collaboration with others in the open-source projects we participate in. Particularly, in JupyterHub and JupyterBook.

JupyterHub: A multi-user server for Jupyter notebooks #

This is how JupyterHub was described when it was announced in 2015, 10 years ago. The focus was on bringing Jupyter Notebooks to multiple users on shared infrastructure. The Jupyter Notebooks focus was so strong that Jupyter was even in the name of the project! Fast forward 10 years, & this is still the most common perception of JupyterHub.

However, this has not been true for a long time now. Instead of setting up 5 different kinds of infrastructure to support your users based on what kind of interface they like to use ( JupyterLab, RStudio, Linux Desktop tools like QGIS or Napari, Visual Studio Code, full ssh (!?), etc) for their interactive computing, you can set up a JupyterHub that supports all of those! Meet your users where they are, rather than force them to conform to using only a specific set of tools.

Come to this talk to:

See cool demos of various popular applications running on JupyterHub seamlessly
Understand the security model of JupyterHub & how that enables these cool demos
Learn how you can set up your own application to run in JupyterHub
Influence the future of how JupyterHub is marketed

Cloudy with a Chance of Savings: Per-User Usage and Cost Monitoring for JupyterHubs in the Cloud #

Cloud cost monitoring is moving beyond just preventing runaway cost explosions – it’s about empowering JupyterHub administrators with the guardrails they need to run efficient, transparent, and sustainable infrastructures. A cloud cost bill can show a broad view of services and machines provisioned, but how can we provide granular insights into each user and the value they are deriving from the hub on an application level?

We’ve developed several open-source components towards answering this question:

Metric Collection – Prometheus collects resource usage metrics (including CPU, memory, and storage) from individual user pods via standard and custom exporters.
Cost Estimation – Usage is correlated with AWS cost data to estimate per-user costs.
Visualization – Grafana dashboards display rich, interactive views of usage and cost data, making it easy to monitor trends, identify high-cost workloads, and generate reports for funders and decision-makers.

This approach delivers cloud observability and cost transparency that can be reliably deployed using Kubernetes and integrated with Zero to JupyterHub distributions.

Controlling home directory costs (with user empathy) on the cloud with jupyterhub-home-nfs #

User home directories on JupyterHubs deployed in cloud providers has been a pain point for both end users and administrators. Administrators feel the pain of cloud costs for home directory storage (sometimes higher than compute!). No user wants to receive an email saying “well, this code you copy pasted downloaded 3TB of netcdf files into your home directory, and now we have used up our entire team’s cloud budget for the next 2 years” (true story).

The jupyterhub-home-nfs open source project is a JupyterHub native, cloud agnostic solution to these problems. Administrators can do per user limits, tune performance, report on usage and make cloud cost conscious choices around overprovisioning. It provides users empathetic guardrails to prevent them from overuse, rather than punitive gates that zap them after the fact.

In this short talk, we will:

Describe the core of the problem, and how it manifests for users and admins.
Review current solutions and their limitations
Introduce jupyterhub-home-nfs and how it moves the solution space forward
Demo how this looks like for an end user
Talk about future direction, and opportunities for collaboration

Rebuilding user trust in `mybinder.org` #

Have you clicked a mybinder.org link, waited for it to start and gave up after it took far too long to start? Or just failed?

Have you stopped using mybinder.org for your tutorials, repositories and presentations because you could no longer rely on it to work each time?

mybinder.org is open infrastructure run by incredible volunteers (who go above and beyond constantly) in the Jupyter community. Is ‘slow fade into unreliability’ just the fate of all openly run infrastructure?

But perhaps maybe, just maybe, you have tried doing that again recently, & noticed improvements! Launches are more reliable. Faster. The UI looks better. Maybe things are getting better?

This talk will go through the problems facing mybinder.org & what we are doing about it. Come to this talk to find out:

How is mybinder.org run?
What are the structural issues facing open infrastructure services like mybinder.org?
What sustainability experiments are we running to improve reliability and rebuild user trust?
Has reliability actually improved?
How can I help?

Can your JupyterHub handle your workload? Performance testing with `jupyterhub-simulator` #

Your JupyterHub is all set up, and you’re excited to use it for your workshop of 60 students. Or your class of 600 students. Or your research group of 5 people with complex workflows.

You feel your infrastructure should hold up fine, but do you know if your infrastructure will hold up fine? Is that just excitement, or is there a little bit of nervousness in there too? Wouldn’t it be nice to test and know for sure?

jupyterhub-simulator is an open source project that allows you to describe what you expect your users to be doing - starting servers, clicking on nbgitpuller links, running notebooks, etc. Once you have described it, you can then simulate any number of users doing that workflow simultaneously, and verify that your JupyterHub can handle that workload. If it can’t, tweak your infrastructure and try again until it does!

In this talk you will learn:

What is jupyterhub-simulator?
How can I describe my expected workflow?
How can I test if N users can do this workflow all simultaneously?
How can I visualize the performance of my infrastructure so I can tweak its configuration and try the simulation again?

Not just for notebooks: JupyterHub in 2025 #

JupyterHub: A multi-user server for Jupyter notebooks

Come to this talk to:

See cool demos of various popular applications running on JupyterHub seamlessly
Understand the security model of JupyterHub & how that enables these cool demos
Learn how you can set up your own application to run in JupyterHub
Influence the future of how JupyterHub is marketed

Introducing Jupyter Book 2.0 #

This is a community talk from the Jupyter Book team, detailing the principles behind the new MyST Document Engine and Jupyter Book 2’s upcoming release. We’ll share the text when the Jupyter Book team posts it publicly.

You don’t need to contribute more to open source, you need to go to therapy #

Open source communities can be incredible and irreplaceable sources of human connection in our lives, offering a unique kind of fulfillment hard to find elsewhere. This feeling of fulfillment and approval can, for some people, be a soothing balm in an otherwise rough life. However, it has the be part of a healthy, balanced ecosystem of different kinds of connections offering different kinds of fulfillment . If a significant chunk of fulfillment in your life comes from the open source work you do, unbalanced by other sources, that can quickly become unhealthy for both you and the community. Disagreements are more likely to become high stakes. Interactions can quickly become emotionally charged and filled with hard to interpret subtext. This can both burn you out, and drive away potential new community members.

This talk explores the why emotional regulation is a critical skill for participating in open source communities, and how therapy can be a tool for learning that skill. Come to learn:

What is emotional regulation?
Why should I care?
What is therapy and how can I access it?
Wait that’s not what I thought therapy was! Are you telling me Missy Armitage lied in Get Out?!
What negative effects do communities feel?

What 2i2c has learned while trying to build sustainable relationships with Jupyter’s community. #

2i2c is a non-profit organization that fosters co-creation and collaboration between science communities and open source communities. We are deeply embedded in an international network of research and education communities, as well as open source communities that underlie their infrastructure (particularly in the Jupyter ecosystem). Our technical infrastructure is built entirely with open source components that we contribute to, but do not control. This is a really hard problem to solve!

We’ve learned a lot in the first four years of our existence. This talk will describe how our organization approaches a healthy and productive open source relationship with the Jupyter (and broader scientific python) ecosystem. It’ll cover some of the major mistakes we’ve made, lessons learned, and where we think we’ve had impact. We aim to make this talk full of practical learnings that others can follow in building sustainable open science organizations that contribute to a healthy and vibrant open source ecosystem. Our goal will be to provide inspiration to others that are interested in building on top of open source projects like Jupyter, and want to do so in a way that is healthy and sustainable.

Open slides: Jupyter Book 2 and MyST at the UC Berkeley Data Science Education Summit

Tue, 01 Jul 2025 00:00:00 +0000

Chris gave a talk about Jupyter Book 2 and MyST at the UC Berkeley Data Science Education Program’s annual meeting. It covered the next direction for the Jupyter Book project, and its recent adoption of the MyST Document Engine for Jupyter Book 2.

You can view the full slide deck here.

Learn more #

Acknowledgements #

Thanks to Project Pythia for funding some of our work on the Jupyter Book and MyST ecosystem
Thanks to CloudBank for collaborating with us on adapting and deploying Jupyter Book for education and organizing this summit
Thanks to the Jupyter Book project for collaborating with us on these strategic efforts over the last years

Jupyter Book at the Scientific Python 2025 Developer Summit

Fri, 23 May 2025 00:00:00 +0000

Chris and Angus recently attended the Scientific Python 2025 Developer Summit on behalf of Jupyter Book, here’s a brief blog post about their experience written with the Jupyter Book team.

Offering Jetstream2-powered hub support at 2i2c

Mon, 28 Apr 2025 00:00:00 +0000

When we first committed to offer Jetstream2 support at 2i2c, Jetstream2, Magnum, OpenStack, ClusterAPI were all new concepts that we hadn’t used at 2i2c before. And although the initial exercise of reading about each of them independently was confusing, learning how they actually glued together was the key. This post is about Jetstream2, 2i2c persistent hub offerings, and the learning that took place in the process.

⭐ Members of 2i2c’s community network can determine their eligibility and learn about JetStream2 in our supported cloud providers documentation. If needed, reach out to 2i2c for support.

Context #

At 2i2c, we want to be able to deploy k8s clusters on different cloud providers. In a very simplistic way, for this we use:

Infrastructure as code to describe, deploy and manage the actual physical infrastructure from the cloud providers
Cloud specific CLI to authenticate to this infrastructure
Helm to deploy and manage k8s resources onto this infrastructure
And finally kubectl to interact with all of these k8s resources

(Main tools used at 2i2c to deploy and manage k8s clusters on different cloud providers)

On cloud providers like GCP, AWS, Azure, the Kubernetes support feels like an atomic feature of the cloud provider and works out of the box. But on Jetstream2, k8s support is not such a solid feature anymore.

Jetstream2 Kubernetes support stack #

Jetstream2 is a collection of supercomputers that are part of the ACCESS cyberinfrastructure. This ACCESS infrastructure groups together super computers like Jetstream2 (but not limited to it), into a mesh that creates the impression of a single, virtual system that scientists can openly access and interactively use.

It offers Infrastructure as a Service (IaaS), that allows users to deploy VMs and manage environments dynamically. And the piece that enables this Infrastructure as a Service feature is OpenStack.

OpenStack and Magnum #

OpenStack is an open source platform made of multiple projects that help build and manage both private and public cloud infrastructure.

For our use-case, one of the most relevant OpenStack sub-project is Magnum. Magnum offers container orchestration engines for deploying and managing containers, like Kubernetes, but not limited to it.

Initially, Kubernetes support was provided through a project called HEAT. However that has proven harder to manage and maintain, and it was extremely hard to upgrade a cluster. So, they’ve migrated towards a new driver called Cluster API magnum driver, which offers a more native k8s integration.

Cluster API and CAPI helm driver #

CAPI itself is k8s project that allows declaring k8s clusters in an easy way.

The helm driver on the other hand is what acts like a bridge between OpenStack’s Magnum and Kubernetes’ Cluster API (CAPI). Its main goal is to to manage the lifecycle (create, scale, upgrade, destroy) of Kubernetes-conformant clusters using a declarative API.

In order to do this, Cluster API provides an API for being able to manage the various components of a Kubernetes cluster. This conceptually looks like a Kubernetes cluster managing other Kubernetes clusters; the former, named the ‘CAPI management cluster’, is the one providing the API for managing the latter workload clusters.

Decomposing the previous atomic feature #

(Comparison between Jetstream2 and other cloud providers when it comes to k8s support)

Magnum is part of the OpenStack tent and it’s the first layer on top of Jetstream2 towards achieving k8s support.

The CAPI helm driver is what’s offering CAPI support. This is the last piece that’s needed to link a k8s cluster down to the hardware where it’s deployed, on Jetstream2.

Challenges #

The Jetstream2-OpenStack stack is not a simple one. It’s a complex stack of technologies and each of the connection points can be challenging to debug and fix when something doesn’t work. Especially when you are one of the first ones that pilots this new magnum driver setup.

So, it was expected that we faced some issues along the way. However, we were able to go around them and add Jetstream2 to our service menu. Below is a list of some of the issues that we faced:

We have to create terraform resource in sequence which takes longer because of a race condition that makes concurrent nodegroups creation requests to fail

bugs.launchpad.net/magnum/+bug/2097946

The role and labels of the nodegroups don’t get propagated to the actual nodes, so we cannot put our own labels on nodes at once

azimuth-cloud/capi-helm-charts#84

The node count and min node count cannot be set to 0 and each nodegroup has to have at least 1 node

bugs.launchpad.net/magnum/+bug/2098002

A default-worker is created apart from the default-control plane nodegroup and we cannot delete it due to the same issue as in 2.
Latest CAPI helm chart version causes autoscaling to stop working in a persistent hub setup, so we had to downgrade it to a previous version

2i2c-org/infrastructure#5601

Conclusion #

The biggest plus, is the people. We got support from Julian Pistorius, which has helped us a lot to both fix and validate some of the behaviours we were experiencing. Also, going through the Jetstream2 support process was also a pleasant experience because they were super prompt in answering and they were very nice.

Jetstream2 has a big plus over the other cloud providers with its openness thought the ACCESS program. This is something very handy to researchers and less costly than other cloud providers. 2i2c being able to offer hubs though this ACCESS program makes things more accessible to more researchers and more cost efficient.

Higher complexity comes also with more control over the infrastructure which has its advantages.

Leaving the challenges apart, the experience was a nice one and the outcome was positive -> 2i2c is now able to deploy both mybinder.org-like hubs as well as persistent storage hubs on Jetstream2 hardware, from the same cloud-agnostic infrastructure.

Acknowledgements #

Thanks to Project Pythia for funding and collaborating with us on this work.

Harnessing Marine Open Data Science for Ocean Sustainability in Africa, South Asia and Latin America

Tue, 11 Mar 2025 00:00:00 +0000

Thank you to Emilio Mayorga for sharing this publication.

Several community members, including Paige Martin (Australian Climate Simulator), Eli Holmes (NOAA Fisheries), and Emilio Mayorga (University of Washington) published case studies in Oceanography magazine’s “Vision for Capacity Sharing” issue.

Their article Harnessing Marine Open Data Science for Ocean Sustainability in Africa, South Asia, and Latin America highlights the benefits of hackweek-style collaboration and learning events to build capacity in underrepresented communities, using 2i2c-supported JupyterHub for seamless set up and effective data sharing.

More on these three specific initiatives is available at their respective websites:

COESSING, Coastal Ocean Environment Summer School In Nigeria and Ghana.
OHWe - OceanHackWeek en Español (in Spanish).
ITCOocean Hack2Week (an Indian Ocean program). Training Course & HackWeek On Machine Learning Based Species Distribution Modeling.

We’re happy to see these communities extend their impact and make interactive computing more accessible to participants around the world.

Chris is joining Project Jupyter's Executive Council

Mon, 10 Mar 2025 00:00:00 +0000

We are proud to announce that 2i2c’s Executive Director, Chris Holdgraf, was recently elected to Jupyter’s Executive Council. The 2i2c team discussed whether Chris should run for this position last year, and concluded that it was a way for our non-profit to both support Jupyter’s mission at a strategic level, and represent the interests of research and education communities in Jupyter’s direction. Chris wrote a blog post about his reasons for running with more information.

One of Chris’ goals is to be a transparent source of information about what the council is working on, where its priorities lie, and what are the major challenges it is trying to tackle. He’s written two blog posts that describe some of his experiences so far, at the links below:

We’re hopeful that this is a way for 2i2c to scale its impact and lean into its commitment to open technology. Chris intends to keep writing about his personal experience via his blog, and we’ll provide updates here for major developments that are relevant to 2i2c’s network of communities. We’re proud to have Chris in this role, and excited for his contributions to the Jupyter community!

Acknowledgements #

Strategic open source support like this is supported by a grant from The Navigation Fund and fees from our member organizations.

Simplifying and speeding up Binder builds with BuildKit

Mon, 03 Mar 2025 00:00:00 +0000

Chris and Yuvi recently wrote a blog post on the Jupyter blog about a recent experiment to significantly reduce the cost of running a node on the mybinder.org federation.

Acknowledgements #

Project Pythia and NASA Open Science / ScienceCore provide support for some of our work with the Binder project.
JupyterHub for working with us to get this new node deployed for mybinder.org.

Enforcing per-user storage quotas now available on GCP

Tue, 25 Feb 2025 14:18:04 +0000

Building upon our previous work developing per-user storage quotas for our AWS infrastructure, we are pleased to announce that this feature is now available for GCP-hosted hubs!

To provide this feature on this vendor, we have updated our infrastructure provisioning system to create persistent disks, and enable automatic backups of the disk for disaster recovery purposes. However, the systems we had already developed for AWS, such as jupyterhub-home-nfs and our alerting system through Prometheus Alertmanager, are vendor agnostic and work right out of the box with the new architecture!

If you would like to try this feature on your 2i2c-managed JupyterHub, please get in touch.

Acknowledgements #

This project was developed and deployed in collaboration with Tarashish Mishra from Development Seed, funded through the NASA VEDA project.

Open infrastructure for collaborative geoscience with Project Pythia: Learning how to deploy a BinderHub on Jetstream2

Wed, 12 Feb 2025 00:00:00 +0000

Project Pythia and the “Jupyter notebook obsolescence” problem #

Project Pythia provides educational resources for essential software tools that enable open, reproducible and scalable geoscience, such as the Pangeo stack of packages (Xarray, Dask, Jupyter). Their Cookbooks are crowdsourced, community-curated, and open-source collections of Jupyter notebooks that demonstrate how to use these tools for cloud-native, geoscientific workflows (see our Project Pythia Cookoff blog post). However, “Jupyter notebook obsolescence” is a common problem: tutorials that were created a few years ago may no longer work due to changes in the software ecosystem and hampers the reproducibility of scientific results. A reproducible execution environment and the infrastructure to support it are essential for the long-term sustainability of these educational resources.

Leveraging NSF-funded cyberinfrastructure for BinderHub #

A BinderHub allows users to dynamically create custom computing environments from Binder-ready repositories containing computational notebooks and configuration files that describe the software environment required to run them. A public Binder service exists at mybinder.org (see our blog post about joining the mybinder federation 🎉) and is a successful example of how open cloud infrastructure can accommodate reproducible execution environments.

The resources available on such a public service are limited therefore 2i2c, together with Project Pythia, have been exploring how to deploy a BinderHub backed by larger resources from the NSF-funded cloud computing platform Jetstream2. This allows for larger simultaneous user loads, such as at workshops, as well as access to more powerful distributed and parallelized workflows required to process large geoscientific datasets, under a persistent resource allocation.

Learning how to deploy on OpenStack #

Jetstream2 uses OpenStack in order to manage pools of compute, storage and networking resources, and for our purposes we specifically make use of OpenStack Magnum Cluster API driver to manage Kubernetes for our deployment.

Cluster API needs a CAPI management cluster in order to manage other Kubernetes clusters, called workload clusters. On Jetstream2, this management cluster is gracefully created and operated by the Jetstream2 team, which means that the only task to worry about is creating and configuring the workload cluster.

For the workload cluster we used the Openstack Terraform provider to define the cluster template, the cluster itself and the node groups in a reproducible way.

After the cluster infrastructure was successfully created on Jetstream2, thanks to the 2i2c hub infrastructure being cloud agnostic as well, deploying BinderHub to Jetstream2, was a seamless experience and it was no different than on other cloud providers that we already supported.

We also learnt about some limitations of the Openstack Magnum driver project, which were expected given it being a relatively recent project, slowly being adopted, but they were all reported upstream and hopefully will soon be fixed.

Acknowledgements #

Jetstream2: Explore ACCESS allocation and Julian Pistorius for technical support
Thanks to Project Pythia for funding and collaborating with us on this work.
Andrea Zonca for preliminary work on Kubernetes deployments on Jetstream 2

Announcing backups for GCP-hosted hubs!

Fri, 07 Feb 2025 13:08:22 +0000

2i2c are pleased to announce the development and deployment of automated backups of home directories on GCP-hosted hubs!

We have developed the gcp-filestore-backups project that regularly creates backups of JupyterHub home directories for disaster recovery purposes. The project is a Python wrapper around the gcloud tool to regularly request backups be made of the Filestore hosting JupyterHub’s user home directories, by default on a daily basis. The script also manages retention of these backups by checking how recently the last backup was made, and the age of existing backups, by default deleting any backup older than 5 days.

Having these backups enabled means that, in the unlikely and unfortunate case of data loss or corruption, we can reinstate the home directories of the hub to a relatively recent state that is at a maximum of 1 day prior to the incident.

We have deployed gcp-filestore-backups to all our GCP hubs presently running, with a retention period of 2 days. If you would like to discuss this further with us, please get in touch!

As ever, this project has been developed openly in line with our Right to Replicate so you can deploy it against your own infrastructure!

Our product goals for Q1 2025

Sat, 01 Feb 2025 00:00:00 +0000

This quarterly post is coming out a little bit late - our goal was to post this in early January, but the year has been more complicated than we bargained for :-)

Over the past year, 2i2c has made team-wide efforts to improve our product planning and delivery. A key part of this is re-organizing an integrated Product and Services team that brings our strategic planning, engineering, and service delivery closer together. We’ve also built systems for planning and measuring progress within the P&S team, and a product initiatives system for planning major work.

Our goal is to organize our product work around a small set of core themes to help us focus and prioritize. As part of this, we’d like to share platform enhancement goals for roughly each quarter. These are not guarantees, but we share them to be transparent about where we think we can be the most impactful in the next few months. Here are the major areas we hope to improve 2i2c’ platform in Q1 2025.

Expand access to cloud providers and improve data safety #

One of 2i2c’s goals is to showcase the ability of open infrastructure to be deployed on a variety of infrastructure proiders. This includes user-facing features, as well as guardrails and safety measures.

In Q1 2025 we are working to bring closer feature parity between hub deployments on AWS and GCP, while enabling disaster recovery with automated home directory backups.

Explore deploying on public infrastructure providers #

Many communities in research and education are interested in leveraging publicly-owned infrastructure providers like JetStream 2 and the National Research Platform. While 2i2c has historically focused on commercial cloud due to their highly-reliable Kubernetes platforms, we think it is important to explore publicly-owned infrastructure providers as well.

In Q1 2025 we’ll begin this expansion by deploying JupyterHubs and BinderHubs on JetStream 2, which will give communities access to publicly-funded computing resources. We will use this experience to decide whether it’s sustainable for us to deploy on this and other publicly-owned infrastructure providers.

Enable enhanced community knowledge bases with Jupyter Book 2 #

A key theme we aim to enable is sharing within and between community hubs. This is a critical part of the data science workflow because it allows people to collaborate on the same ideas, and build on top of one another’s ideas. An early target for this is to facilitate lightweight sharing of computational content so that community members can learn from one another more effectively.

In Q1 2025 we want to help get Jupyter Book 2’s beta released, and provide an out-of-the-box configuration for our communities to use it with their hubs. This includes adding landing pages and better integration with JupyterHub via launch buttons to create a more seamless experience between documentation and interactive computing.

Another key aspect of sharing is sharing the computational environment as well. This would allow communities to not only sheir their content, but also live infrastructure that allows others to reproduce and interact with their work. We think that investing more time into imiproving and deploying BinderHubs (the technology behind mybinder.org) will help us learn more about how to make this a reality.

In Q1 2025 we plan to grow our capacity to deploy BinderHubs across multiple cloud providers. This will allow hub users to build their own Binder environments on the fly and make it possible to share these environments with others, enabling better reproducibility and collaboration within communities.

Give communities more visibility and control over their hub setup and costs #

Perhaps the biggest perceived risk to using cloud infrastructure is the possibility of runaway costs. Community leaders are often nervous that something unexpected will happen and they’ll have to foot a giant bill at the end of the month. We think that reducing this risk is a key way to make cloud infrastructure safer and more useful for research and education communities.

In Q1 2025 we are aim to add more visibility into hub usage and more controls over resources via quotas. This will allow more fine control over resource budgets such as CPU, memory, and storage. We’ll also work on assigning users to groups, allowing communities greater control over resource allocation across large user bases.

A key goal of our Navigation Fund grant is to streamline ourselves into a few repeatable, scalable service offerings at different price points. This will allow us to more easily support new communities and provide a more consistent experience for users.

In Q1 2025 we’d like to define a starting point that we can begin to iterate on. We’ll define a new set of pricing based around a tiered service model, and decide on an initial set of features and services to include with each. Our goal will be to have something defined quickly so that we can iterate a few times with community feedback before the quarter is over.

Standardize our community support services #

Finally, we’ve audited our ongoing support practices and realized that we aren’t always delivering them in an efficient way. We often share the same information one-on-one conversations, and aren’t effectively leveraging our community network to support and learn from one another. We’d like to standardize and boost the scalability of our support services.

In Q1 2025 we want to explore how we can more scalably and efficiently provide hands-on guidance, expert co-creation, and support to communities. Our goal is to define a starting point for these services so that we can offer this support in a sustainable way and begin to learn from our experiences. We also want to build a mechanism for scoping (and pricing) additional capacity that is needed beyond standard community services.

Another update coming in Q2 #

Our aim is to use this blog post as a guide for the quarter, and to make progress in as many areas above as we can. As part of our Q2 planning process, we’ll provide a retrospective on the accomplishments we’ve made towards this effort, and will provide an update for our community on our progress. Stay tuned for more!

Acknowledgements #

Our strategic and organization-level work is supported by a grant from The Navigation Fund and fees from our member organizations.

2i2c joins the mybinder.org federation with a cheaper and faster way to deploy Binderhub

Wed, 29 Jan 2025 00:00:00 +0000

If you’re interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the

Support Binder page for how you can help.

tl;dr: The 2i2c team is joining the mybinder.org federation with a single-node BinderHub instance at 2i2c.mybinder.org. It should be much cheaper to run than auto-scaling Kubernetes clusters, and might be a good way to support mybinder.org more sustainably. For questions or comments, join this Jupyter Zulip thread.

mybinder.org is a massive public service for creating and sharing reproducible computational environments. It is managed by the JupyterHub team and members of the mybinder.org federation. One challenge in running mybinder.org is identifying cloud credits or financial resources to support the cloud infrastructure that runs the service. Two years ago, Google stopped supporting mybinder.org federation with cloud credits, and last month the federation lost more capacity, leaving only GESIS and OVH as remaining federation members¹. This makes mybinder.org less reliable, slower, and generally less useful to the world.

The landscape of cloud infrastructure technology and services has changed considerably, and we think that there’s a way to deploy BinderHub instances with lower costs and less complexity. We’ve accomplished this by deploying a single-node Kubernetes cluster on a VM provider that is much cheaper, now running at 2i2c.mybinder.org. This both relieves Binder’s short-term capacity shortage and may provide an easier pathway for others to support the project in the future.

Below, we’ll describe what has changed to enable this, what we’re deploying, and what the impact should be.

Cloud infrastructure has become cheaper and more commodified #

A key theory of mybinder.org (and 2i2c) is that commercial cloud infrastructure will be commidified over time – what begins as cutting-edge functionality will become commonplace and offered across all cloud providers. As a result, costs will go down over time. Abstractions like Kubernetes will allow you to easily migrate workflows and infrastructure between cloud providers. As a result, you’ll be able to easily follow those costs where there are better options. That’s essentially what is happening here.

There are two key changes that make it much easier to deploy a BinderHub instance at a fraction of the cost:

First, Kubernetes has matured and become easier to deploy. When mybinder.org started, it was using the cutting-edge of Kubernetes functionality. This meant that we needed to use cloud providers that provided a managed Kubernetes service to deal with this complexity. A managed Kubernetes offering tends to be expensive, offered by only a few cloud providers, and thus raises costs across-the-board for the provider that offers it.

However, this was almost a decade ago, and Kubernetes has become both more functional and more stable. There are now many more ways of running Kubernetes, especially for simpler workflows that don’t require autoscaling. In the last several months, we’ve been experimenting with single-node Kubernetes workflows via K3s². K3s is a lightweight Kubernetes distribution that is much easier to deploy and manage. It’s designed for things like edge computing and low-resource environments, and it can be deployed with a single script!

By running a Kubernetes cluster on a single node, we don’t need a “managed Kubernetes service”, which means we can choose from a much larger pool of infrastructure / cloud providers. If all we need is a running VM, this is something the tech industry has been doing for decades.

Second, Managed Object Storage services have more open source options, and are more commodified and cheaper. In addition to Kubernetes, the other thing that BinderHub needs is a way to store and retrieve images for the environments that it builds. This also used to be a fairly complex problem, and thus required managed solutions from cloud providers that charged a premium for their service. However, a number of open source object storage solutions have emerged and made it much easier for providers to support this workflow.³. Because these are open source, infrastructure providers can provide managed object storage at a fraction of the cost.

Because of these two things, we’ve learned that we can run a BinderHub instance on a single VM from a much larger pool of infrastructure providers. This means we should be able to run BinderHub instances at a fraction of the cost.⁴

Deploying BinderHub on a single-node VM is cheaper and simpler #

Last week, we deployed 2i2c.mybinder.org, a single-node Kubernetes instance on Hetzner cloud using K3s. This will run on a single node VM, with a Kubernetes instance that is entirely managed by us, and with managed object storage from Hetzner. Compared to other cloud providers, it is around 5x cheaper per month.

Comparison of rough monthly costs across different cloud providers for similar VM instances. These are rough estimates based on cloud provider pricing pages for an on-demand VM with around 190GB RAM. Pricing pages: Hetzner Cloud ~$300, Microsoft Azure ~$1,300, Google Cloud Platform ~$1,500, Amazon Web Services ~$1,600.

Running a single-node Kubernetes instance will be a cheap and effective way to handle a lot of mybinder.org’s capacity needs. Because it’s a single node cluster, there is no auto-scaling (one reason it is so cheap), which reduces a lot of the complexity we’ll have to manage. These are acceptable tradeoffs for a service like mybinder.org, which runs entirely ephemeral sessions with very limited resources and no promises about uptime, persistence, etc.

You might be wondering: “I thought Kubernetes was supposed to save money.” Normally, running Kubernetes for scalable workflows does save costs because you can scale infrastructure to match your capacity needs. Without scaling, you’d need to provide a VM that can always handle your maximum capacity needs (and pay for the costs the entire time). With Kubernetes, you can request and remove nodes to grow your capacity as-needed (and save money doing so). It looks something like this:

The cost difference between a single large VM vs scalable nodes. Given variable usage over time, kubernetes allows you to scale your cost up and down with need, which is more efficient than paying for a single VM that can withstand your maximum capacity.

However, there is a built-in cost you pay when you use a service that provides managed Kubernetes. Managed Kubernetes services are complex and expensive, and this is reflected across-the-board in the provider’s costs. What if we could achieve the same outcome with a much simpler cloud offering like a single VM?

We did a bit of research and discovered that the Kubernetes and object storage landscape has indeed evolved significantly since the early days of mybinder.org. For example, Hetzner is a cloud provider that has been around for a long time. It has single-node VMs that are about 4x cheaper than their counterparts in Google Cloud or AWS, and provides managed object storage that uses MinIO in a cost-effective way. Using K3s, we can run a lightweight, single-node Kubernetes runtime on this node, and deploy a BinderHub with the same infrastructure as any other BinderHub federation member.

By our estimate, we could fit around 400 simultaneous sessions on mybinder.org (because each session uses very few cloud resources). This is already the majority of mybinder.org’s capacity needs, and at a much lower cost than using a scalable Kubernetes cluster. The cost picture looks something like this:

If your single VM is much cheaper, it might still be the cheapest option. In the case of a Hetzner VM, it has roughly the same capacity as another cloud provider’s VM, but at 1/4 of the cost.

2i2c.mybinder.org now serves 70% of the mybinder.org federation #

About a week ago, we launched 2i2c.mybinder.org running via the methodology we described above. We intended to run this as a longer experiment, but believe that it has already proven useful enough to consider “ready for production”. We recently increased 2i2c.mybinder.org’s load to 70% and will continue to monitor its performance over time. Here’s a plot of where each mybinder.org session has been run over the past ten days - you can see the moment where we turn on 2i2c.mybinder.org to the left:

Sessions launched on mybinder.org’s federation over the past ten days. The yellow area represents sessions run on 2i2c.mybinder.org. They now make up the majority of launches on mybinder.org. Prior to this, gesis.mybinder.org was the only remaining federation member.

For now, 2i2c is sponsoring a max of €350 a month (with some currency conversion noise) to run this service. We’ll provide in-kind labor to run this node, and treat it as an organizational investment in supporting open science, as well as learning new Kubernetes and cloud infrastructure workflows. We’re going to use funds recovered from communities in our community hub network, along with in-kind labor to build out this experiment.

In six months, we’ll evaluate how much effort it was to run this node for mybinder.org, whether it meaningfully helped with mybinder.org’s capacity, and whether it was sustainable for us from a time and labor perspective.

Others can join the mybinder.org federation using this approach as well #

We think that developing this single-node BinderHub workflow will make it much easier for others to join the mybinder.org federation, because it lowers the infrastructure and skills complexity needed to join. Here is a brief guide we’ve written for deploying a BinderHub with K3s. We are helping a few interested organizations deploy their own BinderHubs in this way in order to validate the idea, and are hopeful that this makes it much easier to grow mybinder.org’s capacity via new federation members.⁵

We’re excited to experiment with new ways to support mybinder.org. We think this is an excellent example of how open standards and technology lead to cloud workflows with lower costs and more flexibility. We also think it’s a good example of how it is valuable to have organizations aligned with open science (like 2i2c!) acting in this space. If you have any questions or comments, please join this Jupyter Zulip thread

Anybody want to fund this? #

If you’re interested in making open science infrastructure like Binder more scalable and sustainable, we’d love to find more resources to both sustain this node and cover more development time to run this experiment. Feel free to reach out here.

If you have access to VMs and object storage, and are interested in running a mybinder.org federation member using the methods described here, check out our brief guide for deploying a BinderHub with K3s.

If you’re generally interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the Support Binder page for how you can help.

If you’re interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the

Support Binder page for how you can help.

Acknowledgements #

Thanks to the JupyterHub community for helping us set up this new node.
Thanks to our member communities whose fees currently support this work.

Many thanks to GESIS and OVH for their continued support of mybinder.org, your contributions to keeping this service running are critical! ↩︎
thanks to Carl Boettiger for collaborating on this with us! ↩︎
One example is MinIO, which is used by Hetzner to provide managed object storage for their single-node VMs. ↩︎
For example, Hetzner provides a single-VM option with managed object storage that is roughly 25% of the cost of other cloud providers that also offer autoscaling Kubernetes services. There are many other infrastructure providers who could be used in this way. ↩︎
We’re also experimenting with a few other ways to reduce the complexity and costs of running a BinderHub even further, but will have more on that later as we learn more :-). ↩︎

Enforcing per-user storage quotas with `jupyterhub-home-nfs`

Tue, 28 Jan 2025 09:57:28 +0000

When sharing a storage disk between users, as is usually the case in a JupyterHub deployment, it is important to put in guardrails so that one user cannot eat up the whole storage capacity from the rest of the users. To this end, 2i2c in close collaboration with Development Seed have developed the jupyterhub-home-nfs project which is a Helm chart that permits enforcing per-user quotas on the storage space.

Note that this feature is currently available to AWS hosted hubs only and will be rolled out to other cloud providers in the future.

Under the hood, the Helm chart runs NFS Ganesha as an in-cluster NFS server, backed by XFS as the underlying filesystem. Storage quota is enforced through XFS’s native quota management utility xfs_quota.

Since this feature moves our infrastructure away from managed filesystems (such as AWS’s Elastic File System) that cannot support per-user storage quotas, we have also developed monitoring and alerting mechanisms that will let us know when the disks are getting full, and automated back-ups for disaster recovery.

If you would like to try this on your 2i2c-managed hub, please get in touch.

This project can also be used with any Kubernetes-based JupyterHub, as per our Right to Replicate policy, so please try it out on your own deployment and let us know what you think!

Acknowledgements #

This project was developed and deployed in collaboration with Tarashish Mishra from Development Seed, funded through the NASA VEDA project.

Designing for an ecosystem: a case study in cross-project open source contribution

Tue, 21 Jan 2025 00:00:00 +0000

A key challenge in the open source space is that projects are often independent and autonomous, with relatively few formal ways to collaborate and coordinate efforts. While this usually isn’t a big deal, it means that there is a missed opportunity to grow the impact of an ecosystem because it requires coordinated development among multiple stakeholders within it.

This is one of the reasons we created 2i2c’s open community hub platform. By deploying a single platform that utilizes entirely open infrastructure that we contribute back to, we have visibility over a variety of projects along with the need to combine them together for a specific end-user outcome. One-such development scenario recently came up involving Jupyter Book 2 and JupyterHub.

Allowing readers to “bring their own Binders” #

We’ve recently been working to integrate Jupyter Book 2 workflows with our community hubs for a more seamless experience (for example, having book pages link back to interactive cloud sessions that allow users to interact with the content). We imagine a network of Jupyter Books that all build upon the same core infrastructures (JupyterHub, Binder, etc) for cloud-based computing. Our hope is to allow a user to bring their own Binder with them so that they can interact with another book’s content with their own cloud infrastructure. For example:

A student with access to binder.myuniversity.edu could read a Jupyter Book created by a professor at otheruniversity.edu.
The Jupyter Book is defined with a Binder specification that has a recipe for re-building the environment needed to run te book’s content.
From the professor’s book, the student can choose to launch an interactive Binder sessions on their university’s Binder, allowing them to interact with the book’s content on their own infrastructure.

We want a workflow like this to be as seamless and un-complicated as possible. We also want it to follow the same fundamental workflow as the nbgitpuller-based launch buttons. Along the way, we realized that we needed to coordinate development across Jupyter Book 2, JupyterHub, and BinderHub.

The three projects (Jupyter Book, BinderHub, and JupyterHub) that needed to work together to enable ‘bring your own binderhub’ workflows.

Getting Jupyter Book to discover Jupyter Hub #

As we began developing this workflow, we realized that there was a blocker in the JupyterHub and BinderHub ecosystem that needed to be fixed. We needed a way to ask a JupyterHub whether it had an unauthenticated end-point for service discovery. Basically, a way to ask a hub “what kind of hub are you, and how can we launch an interactive session on you?” Doing this is simple-enough - JupyterHub already has a way of reporting its version and application type, which allows us to infer how to launch interactive sessions. But, we hit a snag in an HTML context.

By default, JupyterHub disallows certain kinds of Cross-Origin Resource Sharing (CORS) requests, in order to restrict other applications from abusing a JupyterHub’s API. If you hit parts of a JupyterHub API from the command line, things work fine. But if you do the same thing via JavaScript from a website, the request is disallowed. This was a problem if we want Jupyter Book (a web application) to be able to make requests of JupyterHub’s API.

So, we realized that we needed to make an upstream contribution in JupyterHub in order to enable an interaction between JupyterHub and Jupyter Book. In this case, it was a relatively simple fix: allowing CORS requests for the specific API endpoint we needed (which is a very lightweight endpoint that is not vulnerable to security risks, and is broadly useful to make accessible)¹. That resulted in two PRs:

jupyterhub/jupyterhub#4966 allows CORS requests for the API that was needed for service discovery in JupyterHub.
jupyterhub/binderhub#1906 enables this workflow on a BinderHub so that its services can be discovered.
jupyter-book/myst-theme#503 adds new launch button functionality to Jupyter Book 2 that allows readers to bring their own Binder / JupyterHub links for launching. (this is what necessitated the above two PRs)

As a result of this upstream contribution loop, JupyterHub can now accept API requests at its “service discovery” endpoint, which means that Jupyter Book (and any other web application) can more easily learn about a hub’s capabilities and version.

We wanted to share this short vignette because it’s a good reflection of the kind of value that 2i2c tries to provide, given its role in helping to build and enhance networks of infrastructure, domain communities, and open source communities. In this case, we enabled a cross-project workflow that required knowledge of each project, and a vision for how they could be used together in a way that exceeded the sum of their parts.

We think there’s a lot more potential in these kinds of workflows, and are eager to continue our work to identify and enhance community-centric infrastructure for interactive computing.

Acknowledgements #

Thanks to the JupyterHub and Jupyter Book communities for collaboration and review on this work.

This “bring your own Binder” workflow benefits NASA training communities including NASA Open Science / ScienceCore, which partially supports this work.

This actually required an interesting bit of team discussion that was much easier with a few 2i2c staff on the JupyterHub team. The original request from Angus was interpreted as opening up the entire hub API to external requests (which is a bad idea!) but we were able to quickly discuss this with the JupyterHub team to clarify that this was only about a very specific API endpoint. This is the kind of communication loop that often goes haywire when you have people contributing to a project without historical relationships to the project’s maintainers. ↩︎

Announcing our formal commitment to open technology

Wed, 15 Jan 2025 00:00:00 +0000

In this post, we’re sharing our Commitment to Open Technology. It is focused on software licenses for reasons we’ll describe below. We hope that it clarifies what kind of licenses we’ll use, and assures our communities that we will not change our stance towards open source technology in the future. This ensures 2i2c’s long-term commitment to community-owned and open infrastructure.

Being a platform and service provider gives us a lot of power, and also introduces a potential source of lock-in for our member communities. While 2i2c’s organizational mission and culture are strongly aligned with open infrastructure, we believe it’s important to encode commitments like these in a formal way to provide both transparency and accountability to our member communities.

Our commitment to open technology #

Below we copy the original language of this policy from our Commitment to Open Technology:

Definitions of MUST, MUST NOT, SHOULD, MAY, etc are defined in RFC 2119

All engineering artifacts (code, documentation, etc) produced by 2i2c’s engineering team MUST be licensed under an open source license approved by a non-profit organization that is not 2i2c.
Open Source Projects originating at 2i2c, or stewarded by 2i2c, MUST NOT require a Contributor Licensing Agreement that includes Copyright Assignment to 2i2c.
The list of external organizations that define licenses we accept are
1. the Open Source Initiative
2. the Organization for Ethical Source.
Modifying (1), (2), or (3) MUST be done through a 2/3 majority vote of 2i2c staff.

What does this commitment mean? #

In plain language, here’s what this commitment means:

We’ll only use open source licenses that have been approved by standard non-profits that are broadly recognized by the tech industry.
For anything we build, we won’t require contributors to give up the rights to their contributions via CLAs, so that it is much harder for 2i2c to change our licenses in the future.
Changing this policy will require organization-wide agreement, and in the future we’ll give authority over this policy to a group of people representing our member communities.

Why are licenses and CLAs important? #

Many organizations claim to be committed to open infrastructure, while retaining the ability to change this commitment in the future when it is in their interests. A classic example of this is a “bait and switch” that looks something like this:

A company releases software under an open source license and professes to build an open source community around it.
However, they retain the rights to all of the code in their projects through a Contributor License Agreement (CLA) with copyright assignment. This generally means that contributors must give up the rights to their contribution in order to make that contribution.
Once their product has gained traction and it is in their interests, the company can change the license to whatever they wish (even one that is not open source) because they retain the rights to all contributions in the codebase.
They then leverage this new position as owners of a proprietary project to extract business value or grow their position in a market.

Think this sounds unlikely? Here are just a few recent examples of companies that have switched their license after many years of releasing their technology under an open source license:

We want to ensure our communities that 2i2c is not headed down this path, in order to give them confidence in treating us as a long-term service partner.

What does this change about 2i2c’s open source commitment? #

In short: nothing. These are already the principles that 2i2c was committed to from its inception, and already implied via our Right to Replicate. However, we wanted to make these commitments more formally in order to give ourselves more accountability to sticking with them, and to provide more transparency for our community members and stakeholders.

Who is this for? #

We imagine three audiences for this policy:

2i2c present and future staff who want to ensure that their organization remains committed to our open principles. This document provides a sense of psychological safety to have bold discussions about structuring our approach to open source.
Member communities and 2i2c stakeholders who need to have an understanding of the guarantees that we provide in order to trust 2i2c as a service developer and provider. This is similar to the effect our Right to Replicate has.
Open source communities who need to understand our long-term commitment and goals around open technology in order to trust as a peer and collaborator within open source communities.

We’d love feedback #

We hope that these ideas both clarify our intent and the reason that we think it’s important. We’d love feedback about early refinements to these principles in order to make them more effective, as well as ways that we can provide more community oversight and participation in evolving these policies moving forward. If you have any thoughts to share, please send feedback via e-mail hello@2i2c.org.

Acknowledgements: The creation of this policy and the rationale behind it was led by Yuvi Panda with feedback from 2i2c’s team. This blog post was co-written with Chris Holdgraf. Strategic work like this is supported by a grant from The Navigation Fund.

NASA VEDA & 2i2c Update for Q4 2024 (Oct-Dec 2024)

Tue, 07 Jan 2025 15:18:37 -0800

A non-exhaustive list of things 2i2c and Development Seed did with the NASA VEDA project last quarter!

Automated backups and alerting with `jupyterhub-home-nfs` #

Tracking Issue

jupyterhub-home-nfs is a young project to provide flexible per-user home directory limits on JupyterHub - an important feature for controlling cloud costs. Tarashish Mishra and Sarah Gibson have been leading this project for the last few months. Since we are moving away from AWS Managed EFS here, we had to do some work to recreate some of the benefits EFS gives us out of the box. During this quarter, we:

Set up automated backups so we can recover files in cases of disaster
Set up automated alerting (via prometheus and pagerduty) to know if our backing EBS device is getting full and we need to perform a manual intervention
Deployed this to a few other communities ( CryoCloud and NMFS Openscapes) to broaden adoption.

We will continue doing work on jupyterhub-home-nfs in the upcoming quarter! If this is functionality you are interested in deploying, please reach out to us to collaborate!

Enable users to dynamically build environments with `jupyterhub-fancy-profiles` #

Tracking Issue

We covered this more extensively in another blog post, so go read that!

This work in particular is a good demonstrator of 2i2c’s value - it started off with a grant from GESIS, and now with support from NASA IMPACT we are able to bring it to a lot of communities, not just the ones that funded it.

Ongoing work here will focus on improving the UX as well as better documentation so users can actually use it!

“Open in QGIS” from VEDA UI #

Tracking Issue

We had worked in the past with many communities in enabling QGIS on the Cloud, and this quarter we got closer to enabling a contextual ‘Open in QGIS’ button in the VEDA Dashboard! Here is a quick demo:

(This shows the workflow when user is already logged into the JupyterHub and had started the server)

You can play with this in this preview, although you need to have access to the NASA VEDA hub to fully try it out at this point.

Tarashish from Development Seed is again responsible for most of the work here, available in jupyter-remote-qgis-proxy. You can use it to create ‘magic links’ that will open QGIS in a desktop environment in your browser, and add a specific layer to it! Our hope is that this allows primarily GIS folks to better use tools they already are familiar with in cloud based contexts.

Other updates #

We participated heavily in an evaluation process for the authentication and authorization solution to be used across NASA VEDA! Tracking Issue
We are very close to rolling out JupyterHub 5.0 and associated changes across all our hubs, which will enable us to eventually offer per-group shared directories! Tracking Issue

Acknowledgements #

Thanks to the NASA VEDA project for thir ongoing support for this work.
Thanks to DevSeet for their collaboration and leadership on this project.

`frx-challenges`: A new tool to host data challenges for Frictionless Research Exchanges

Fri, 06 Dec 2024 00:00:00 +0000

2i2c is pleased to announce the frx-challenges project, a new open source tool to help communities host data challenges on shared infrastructure:

2i2c-org/frx-challenges

This project aims to make it easier for administrators to provide a service that enables users to submit code and data that are evaluated on secure infrastructure with access to private data and resources. It also provides a leaderboard that helps users compare their performance against others.

An example leaderboard for a data challenge, taken from the Cellmap Challenge. Users make submissions that are run against secure and private infrastructure and data, and provides feedback about the submission’s performance. Learn more about the FRX challenges project here: 2i2c.org/frx-challenges/

It is designed to be lightweight and flexible, and can be run on a variety of shared infrastructure. For those who wish to run this project on cloud infrastructure, we’ve also published a Helm Chart to help you deploy frx-challenges with Kubernetes.

While it can be run on its own, we believe that it naturally complements other tools and services for interactive computing and data, such as JupyterHub, Jupyter Book, and Binder. More on that below.

Below is a brief description of the motivation behind this project.

What are Frictionless Research Exchanges #

The project is heavily inspired by David Donoho’s vision of Frictionless Research Exchanges (FRX) as described in Data Science at the Singularity.

In this article, Donoho describes three key pillars for Frictionless Research Exchanges:

The three initiatives are related but separate; and all three have to come together, and in a particularly strong way, to provide the conditions for the new era. Here they are:

[FR-1: Data] datafication of everything, with a culture of research data sharing. One can now find datasets publicly available online on a bewildering variety of topics, from chest x-rays to cosmic microwave background measurements to uber routes to geospatial crop identifications.

[FR-2: Re-execution] research code sharing including the ability to exactly re-execute the same complete workflow by different researchers.

[FR-3: Challenges] adopting challenge problems as a new paradigm powering scientific research. The paradigm includes: a shared public dataset, a prescribed and quantified task performance metric, a set of enrolled competitors seeking to outperform each other on the task, and a public leaderboard. Thousands of such challenges with millions of entries have now taken place, across many fields.

We considered the landscape of tools and services, and felt that [FR-1] and [FR-2] were already well-served by a variety of tools and services for community workspace infrastructure (e.g., JupyterHub: jupyterhub.readthedocs.io), sharable computational environments (e.g., BinderHub: binderhub.readthedocs.io), authoring and reading computational narratives (e.g., Jupyter Book: jupyterbook.org and MyST: mystmd.org), and data I/O tools and standards (e.g., Zarr: zarr.readthedocs.io and Intake: intake.readthedocs.io).

However there was a natural missing piece for [FR-3 Challenges], and we could not identify any community-managed infrastructure that facilitated data challenges. This is the goal of frx-challenges.

Why facilitate data challenges? #

Data challenges are harder than you think! While it is simple enough to run somebody else’s code locally, data challenges require a systematic, secure, and automated approach to accepting and evaluating submissions in a fair and repeatable way. Here are some of the big challenges to tackle:

Submissions must retain user and team identity, which means that we must keep track of users and their submissions over time, since data challenges are designed to encourage iterative improvement and optimization.
Evaluations must use potentially complex resources and data since many data challenges operate by publicly sharing a small dataset, and then running it against a much more complex dataset.
Evaluations must be totally secure, so that submissions can’t do nefarious things like mine cryptocurrency or extract the challenge’s private data in unintended ways.
Evaluations must be automated, so that running the challenge does not require extensive human intervention and can scale to many users.
Evaluation must be flexible, so that the infrastructure can accept a variety of types of submissions (e.g. code, data, model weights, etc), run them with arbitrary environments designed by the organizers, and run them with the right hardware to get the job done.

These are just a few of the major challenges that we’ve tried to address with frx-challenges, and we’re excited to see how it goes with our first assisted community challenge: the Cellmap Challenge.

If you’re interested in learning more or participating in this project, follow along at its GitHub repository:

2i2c-org/frx-challenges

This is still the very early stages of the project, and we imagine it will evolve significantly. We welcome feedback for how it can more effectively serve a variety of communities.

Acknowledgements #

Thanks to the Howard Hughes Medical Institute (HHMI) for collaborating with us on the Cellmap Challenge, which led to the creation of this project.

Thanks to Kristen Ratan and Strategies for Open Science (Stratos) for enabling this collaboration, and providing strategic guidance and support.

Improving the logged in home page experience in JupyterHub with `jupyterhub-fancy-profiles`

Mon, 18 Nov 2024 12:55:20 -0800

On most research oriented JupyterHub installations, users would like to customize their server (the environment, resources available, etc) after logging in. In Kubernetes based JupyterHub environments, a profile list provides this functionality.

(Profile List for the NASA VEDA JupyterHub with the default implementation from KubeSpawner)

The profile list is the de-facto “logged in homepage” for these users, as that is what they see after they have logged in.

In collaboration with Development Seed, funded by our earlier grant from GESIS as well as the NASA VEDA project, we have been building the jupyterhub-fancy-profiles project to improve this experience.

(Profile List for the NASA VEDA JupyterHub with jupyterhub-fancy-profiles)

Last week, we rolled this new experience out to all 2i2c managed JupyterHubs! Here’s a quick rundown of what this enables:

Descriptions for choices in the dropdowns, making it much easier for users to know what they are getting with each environment (or resource selection).
Fully backwards compatible with the existing KubeSpawner profile list implementation. In our PR to roll this out to all hubs, you notice that we didn’t have to change the structure of any profile lists! So you can safely roll this out to your hubs too without needing to fundamentally change how your profiles are set up.
It is a modern web app (built with react), just like the JupyterHub admin panel. This allows us to evolve and satisfy user needs much faster, as well as expanding the pool of people who can contribute to the project!
Support for dynamically building images using mybinder.org style repositories! It talks to the binderhub API so users can build reproducible environments as they wish without admin involvement nor needing to fully understand how docker and containers work. Our earlier blog post has more information.

This is just the start, and thanks to ongoing funding from the NASA VEDA project, we are going to continue making improvements to this experience.

Use this in your JupyterHub #

As with everything we build at 2i2c (per our right to replicate policy), this project can be used with any JupyterHub installation that uses Kubernetes. There are instructions in the README. Please try it out on yours and let us know what you think!

Credit #

The project was initiated with funding generously provided by GESIS (see our earlier blog post).
Sanjay Bhangar and Oliver Roick from Development Seed for advocating for this project and contributing heavily to it.
The NASA VEDA project (in particular, Brian Freitag and Alex Mandel), for continued funding (in the form of engineering time) plus being early adopters!

Announcing the Jupyter Book 2 alpha

Mon, 18 Nov 2024 00:00:00 +0000

Cross-posted from the Jupyter Book blog. Note that some MyST functionality is not supported on the 2i2c website – please see the original post for previews.

Over the last ten months, the Jupyter Book team have been hard at work; Jupyter Book has become a Jupyter subproject, and the team¹ announced a plan to release Jupyter Book 2. This post announces the alpha release of Jupyter Book 2.0, which has been re-written from the ground up to use the new MyST-MD engine.

Over the next few months, we will work in preparation for the full release of Jupyter Book 2. Stay tuned for more! The initial documentation for the alpha release can be found at:

next.jupyterbook.org/

Install the Jupyter Book 2 Alpha #

The Jupyter Book 2 alpha is available from PyPI.org. You can install it with pip, using

pip install -U jupyter-book==2.0.0a0

If you use pipx, it’s recommened to run Jupyter Book 2 using

pipx run jupyter-book==2.0.0a0

Jupyter Book 2 needs Node.js installed on your computer. If this is not the case, running jupyter book will prompt you to install it using the nodeenv package that ships with Jupyter Book 2:

❗ Node.js (node) is required to run Jupyter Book, but could not be found`.
❔ Install Node.js in '...'? (y/N):

Press y and Enter to proceed.

The Jupyter Book 2 project is a complete re-write of Jupyter Book. We expect there to be bugs and breakages! Please use our support channels to keep us up to date with your findings!

Discord

GitHub Issues

New Features in `2.0 alpha` #

Rich Hover Previews #

The new MyST book and article themes provide useful hover previews for links to other MyST content, Wikipedia, GitHub issues, and many more.

Content from other websites built with the MyST engine can be embedded in your own sites and PDFs:

Cross-referenced content can easily be embedded and re-captioned into other pages and projects, such as this figure to mystmd.org/guide/embed#mylabel.

Simple Instant Search #

A new client-side search uses a simple, modern algorithm for fast local search that finds the results that you care about.

Client-side search uses simple, modern, Algolia-inspired search algorithm to provide useful search results. We will be iterating on this in the near future for even richer search results!

High Quality PDFs #

PDF documents can now be built with Typst, a high-quality typesetting engine that produces readable error messages and beautiful documents. This feature was the basis for the 2024 SciPy proceedings, which is now built on MyST Markdown and will be accepting Jupyter Notebooks in 2025.

Example of the LaPreprint Typst template for rendering PDFs from Jupyter Book (via the MyST Engine).

Coming Soon in `2.0 beta` #

Custom Styles & Scrips #

Jupyter Book 2 will make it easy to tweak your website styles, and add new website behaviors.

Generate Markdown from Code Cells #

The MyST engine is on-track to support the inclusion of references and other markup features generated by code cells.

Control Cell Visibility with Tags #

In the beta release, Jupyter Book 2 will once-again be able to show and hide content according to cell tags.

Jupyter Book 2 vs MyST-MD #

At this early stage, the new Jupyter Book application jupyter book behaves identically to the mystmd engine that it is built upon; as outlined in our Jupyter Book 2 plan, we intend for Jupyter Book to be an “opinionated distribution” of mystmd that shares the same configuration format and CLI. This contrasts with Jupyter Book 1, which was built on top of the Sphinx documentation engine, but offered its own CLI and configuration files. In future, the jupyter book and mystmd CLIs may diverge from one another, but we expect that this will be handled in a graceful manner: mystmd commands should always be compatible with the jupyter book application.

Acknowledgements #

Thanks to Project Pythia for funding some of our ongoing work on Jupyter Book.
Thanks to the Jupyter Book community for their collaboration.

Jupyter Book project has historically been a technical project of the Executable Books organisation. In 2024, the establishment of a Jupyter subproject means that the Jupyter Book project now has its own identity outside of Executable Books. ↩︎

Openscapes goes to the White House!

Mon, 23 Sep 2024 00:00:00 +0000

Our partner OpenScapes recently took a trip to the White House to advocate for Open Science and the Open Source ecosystem, check out their blog post about the experience.

MyST Mini-Hackathon with the DeepLabCut Team

Mon, 02 Sep 2024 00:00:00 +0000

The DeepLabCut Team #

Animal pose estimation using deep neural networks. Courtesy of the DeepLabCut Jupyter Book

The DeepLabCut team is a group of researchers and developers who are working on open source tools for analyzing animal pose estimation by training deep neural networks on videos.

Chris Holdgraf visited the lab in early August to learn more about how the group were using open-source tools to document and share their work.

Jupyter Book and MyST #

Extensive documentation for using the DeepLabCut software package is already available as a Jupyter Book. The group was interested in adopting MyST Markdown to stay ahead of the curve and upgrade their Jupyter Book (see the related announcement Jupyter Book 2 will be build upon the MyST-MD engine).

Chris led a mini-hackathon to introduce the group to MyST and collect feedback on where enhancement features could be made in the future. Here’s a summary of the outcomes:

Many improvements were made to the MyST documentation 📖
- The MyST Quick Start Guide was used to onboard new users. Amendments were upstreamed to the MyST docs directly and were immediately available to all.
- A tutorial on executable documents was added to the collection of MyST tutorials.
- MyST-MD installation instructions were simplified using mamba.
A bunch of enhancement features were requested ✨
And we found a bug in the table of contents validation 🐞

Summary #

Hackathons are a great way for quickly imparting knowledge and gathering feedback in a short space of time. The event spurred rapid contributions to the MyST ecosystem – embracing reuse of the MyST quick start guides saved time and effort, while engaging with users directly closed a tight feedback loop for enhancements.

Acknowledgments #

Thank to the Mackenzie Mathis Lab for hosting Chris Holdgraf at EPFL, Lausanne, Switzerland.
Thanks to the Jupyter Book team for collaborating on this with us.

Collaborating with Development Seed to deliver cyberinfrastructure for NASA VEDA

Fri, 12 Jul 2024 00:00:00 +0000

Thank you to Sajjad Anwar and Sanjay Bhangar for contributing to this post.

The VEDA dashboard

The 2i2c team are proud to continue our strong working collaboration with Development Seed, following our previous work on launching the US GHG center (also see the Development Seed blog post). Together with scientists at NASA in our regular sync touchpoints, we have recently delivered a tranche of improvements to the Visualization, Exploration and Data Analysis (VEDA) project.

This platform is designed to thread open-source components together to consolidate GIS delivery mechanisms, processing, analysis and visualization tools, and presented in a collaborative interactive computing environment. All code repositories and associated resources stemming from this work are available on the VEDA GitHub page.

In the spirit of fully open development, you can see the objectives the combined 2i2c and Development Seed team had for the last quarter. In this blog post, we will describe some of the significant ones!

Better image management and testing #

The repo2docker-action is a GitHub action simplifying image building and testing for use with JupyterHub, using either a Dockerfile or various configuration files (like requirements.txt, environment.yml, etc) supported by repo2docker. We migrated our image building pipeline from a somewhat homegrown solution to this upstream action, making image updates and testing much easier. In particular, we can automatically run test notebooks on every change we make to the image! This way, we can easily catch any breaking changes in library versions or other package installs without disrupting users. We also debugged and contributed upstream fixes to the testing infrastructure so everyone could benefit from this, rather than just us.

Automatically pulling example notebooks on startup #

When a user logs into a JupyterHub, it is very helpful if we could have a bunch of example notebooks and other content pre-populated for them so they can get started right away. nbgitpuller is heavily used for this particular use case. However, it requires that nbgitpuller is installed inside the image the user is using - and not all images have it installed. In particular, we wanted to continue using the (wonderful) Rocker images maintained upstream for R users, however they do not have nbgitpuller installed. To solve this problem we built jupyterhub-gitpuller-init, which can be used as an init container to pre-populate user content on persistent home directories regardless of the image used. We also made sure to build this in a way that anyone can use it, and it is not tied into either 2i2c or VEDA infrastructure!

Opening specific visualizations in QGIS via URL #

QGIS is the world’s most used open source GIS software, and previously 2i2c had worked with Openscapes and QGreenland to bring this desktop software to JupyterHub. We had previously worked on a container image that allows users to access large datasets stored in the cloud directly through QGIS on the JupyterHub, allowing users to work with much larger datasets than they could on their desktops by bringing cloud compute adjacent to the data. As a continuation of this work, we developed jupyter-remote-qgis-proxy, which builds QGIS specific features on top of jupyter-remote-desktop-proxy. In particular, it allows creation of shareable links that when clicked, opens specific datasets and layers in QGIS in a JupyterHub! You can see this in action:

Launching QGIS on a Linux desktop served by the VEDA JupyterHub

This opens up exciting future possibilities. Imagine this exploration of the Camp Fire having an ‘Open in QGIS’ button that enables further exploration of the data without the user needing to download or install anything! Work will continue in the coming quarter towards achieving this vision.

We are also excited to see recent work in this space from QuantStack and Simula Labs, and will follow up to ensure an orderly transition to more web native workflows for existing users of QGIS in due time.

Better Profile Selection #

This is a continuation of our GESIS collaboration. In the path to deploying dynamic image building to end users, we wanted to stabilize jupyterhub-fancy-profiles enough to deploy to users of VEDA (and eventually everyone else). This is the primary interface users see after they log in to JupyterHub, and was ripe for UX improvements. The default interface looks like this:

The revamped one is much more streamlined and looks like this:

Revamped Profile Screen

This is currently deployed to a staging hub and has helped us shake out a lot of bugs! We expect the improved interface will be rolled out to all users in the near future. We are also planning further development to make the user experience even better and smoother for everyone.

Supporting workshops #

End users benefiting from our work is what ultimately gives meaning to our work. To that end, we were very happy to support running workshops during this collaboration – see our related blog post US Greenhouse Gas Center supports summer school at CIRA for more information.

Ongoing Collaboration #

Delivering on these objectives in a timely way heavily depended on the success of the team collaboration. Sanjay Bhangar of Development Seed commented

Working closely with the 2i2c team on growing features to support users on the VEDA and GHG Center hubs has been absolutely amazing. With 2i2c’s deep experience in the Jupyter ecosystem, we have been able to implement some fairly complex features quite easily, and their strong open-source roots have ensured that whatever we work on is broadly useful to the wider Jupyter and scientific computing communities.

Take a look at the companion Development Seed blog post of this work.

This collaboration continues, and we have now published our objectives for the coming quarter. Watch this space!

Acknowledgements #

Development Seed
NASA IMPACT
Tarashish Mishra, Julia Signell, Oliver Roick, Slesa Adhikari and Sanjay Bhangar for various code contributions towards these objectives

Enabling neuroscience in the cloud with HHMI Spyglass and MySQL on JupyterHub

Fri, 05 Jul 2024 00:00:00 +0000

The HHMI Spyglass tutorial

Spyglass #

Spyglass is a framework for reproducible and shareable neuroscience research produced by Loren Frank’s lab at the University of California, San Francisco. Check out our blog post about the release of their preprint to read more about the methods.

This post focuses on the complex data storage needed for the project, which can be difficult to set up locally or at scale in the cloud. In particular, the analysis needed a MySQL database for reproducibility. This is a fairly common task across many fields. The aim of 2i2c is to enable researchers to focus on the essential complexity of what they were doing, i.e. the science, without managing the accidental complexity of how to do it – in this case, setting up databases.

We describe how you can do this too for your own JupyterHubs. Since 2i2c commits to running our infrastructure in line with open-source values as much as possible, you can also directly see the configuration for the hub referenced in the paper.

What is a “sidecar container”? #

The Kubernetes definition of a sidecar container is

Sidecar containers are the secondary containers that run along with the main application container within the same Pod. These containers are used to enhance or to extend the functionality of the primary app container by providing additional services, or functionality such as logging, monitoring, security, or data synchronization, without directly altering the primary application code.

In this case, the primary app container is the JupyterLab instance where people are interactively running code and doing science. We want to provide a MySQL database as a sidecar so that each user server gets their own independent MySQL server instance (that is not accessible to anyone else). We can then run code such as

%%bash
mysql -h 127.0.0.1 -u root --password=tutorial < path-to-sql-file-with-data

to load data into the database. Note the IP address 127.0.0.1 - the MySQL server is listening on localhost, even though it is not running in the same container! Thanks to the magic of Linux Network Namespaces, the sidecar and main app container can share 127.0.0.1. This allows you to write code that works in the exact same way on a user’s local computers as on the JupyterHub, making transitions and replication easier.

Setting up sidecars in JupyterHub on Kubernetes #

We’re leveraging multiple tools from the open-source ecosystem - JupyterHub, Kubernetes, Linux as well as MySQL itself.

Since this is a Kubernetes feature, we can pass through config to it. There are two layers here, which are

singleuser.extraContainers in z2jh configuration
KubeSpawner.extra_containers in KubeSpawner configuration

The hub configuration looks like

 singleuser:
 extraContainers:
 - name: mysql
 image: datajoint/mysql:8.0 # following the spyglass tutorial at https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
 ports:
 - name: mysql
 containerPort: 3306
 resources:
 limits:
 # Best effort only. No more than 1 CPU, and if mysql uses more than 4G, restart it
 memory: 4Gi
 cpu: 1.0
 requests:
 # If we don't set requests, k8s sets requests == limits!
 # So we set something tiny
 memory: 64Mi
 cpu: 0.01
 env:
 # Configured using the env vars documented in https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
 - name: MYSQL_ROOT_PASSWORD
 value: "tutorial"

By setting this up, we allow users to insert the code snippet above

%%bash
mysql -h 127.0.0.1 -u root --password=tutorial < path-to-sql-file-with-data

into their Jupyter Notebooks, which gives access to their MySQL database in the hub!

However, this configuration does not include permanently store the database itself between hub server sessions. Thanks to a pilot in a prior collaboration with University of Texas, Austin, we do have some documentation on how you can enable that as well!

Acknowledgements #

Howard Hughes Medical Institute
National Institute of Mental Health (NIMH), grant number RF1MH130623
kubespawner
zero-to-jupyterhub-k8s and the JupyterHub community

Hacking the Project Pythia Cook-off with MyST Markdown

Tue, 18 Jun 2024 00:00:00 +0000

Photo courtesy of Dr Debanjana Das

What is Project Pythia? #

Project Pythia is the education working group for Pangeo, a community platform for Big Data geoscience in which 2i2c operates a cloud hub. The core aim of Project Pythia is to spearhead the creation and curation of community-driven, open-source documentation, in the form of “cookbooks”, to enable the adoption of open, scalable and reproducible workflows for geoscientists.

What did 2i2c do? #

Jenny, James and Angus from the 2i2c team participated in the annual Project Pythia Cook-off 2024, a hackathon where cookbook authors and collaborators can spend dedicated time on creating and maintaining their content using Jupyter Book and deploying their cookbooks with GitHub actions.

2i2c teamed up with the infrastructure breakout group during the hackathon, led by Katelyn FitzGerald ( UCAR) and Kevin Tyle ( University at Albany), and members of the Curvenote team also joined the group.

Day 1 #

2i2c deployed and demonstrated a dedicated BinderHub service for Project Pythia that allowed hackathon participants to “self-serve” images of their software environment, which were specified by including a list of packages in an environment.yml file placed in their GitHub cookbook repository. Participants could then pull the image from a container registry into their 2i2c hub (or indeed, any other JupyterHub server) to share and reproduce their computational environments with ease.

Day 2 #

During the first half of the day, we quickly identified a number of issues that were proving to be a maintenance headache for the Project Pythia infrastructure group:

Configuration files for each cookbook were difficult to update at scale. Project Pythia currently have a gallery of over 30 cookbooks!
Changes to Sphinx-based themes inherited from upstream were prone to breaking custom Project Pythia branding downstream.
Executable content was not able to run on the Project Pythia’s dedicated BinderHub hosted on JetStream2 (operated by NSF).
Cookbooks frequently cross-referenced materials from other cookbooks to build upon pre-existing knowledge, but this was not easy to author and the reader experience was not as smooth as it could be.

Following the announcement that Jupyter Book 2.0 will use MyST last month, Rowan (Curvenote) and Angus (2i2c) delivered a compelling demonstration of the MyST ecosystem centered around modern web-first technologies (JavaScript/TypeScript) that offers improved interactivity and accessibility.

In the second half of the day, we decided to use the hackathon to explore migrating the Pythia cookbooks from using a Sphinx-based to a MyST-based document structure and engine. Within one afternoon, the group migrated four cookbooks to use MyST MD

This moment was palpably exciting! It was evident that MyST MD supported backwards compatible content out of the box, which alleviated fears of sunk cost into existing Sphinx-based cookbooks. The migration workflow was as simple as executing the following commands

conda install mystmd

myst.

Day 3 #

We spent this day tackling support for managing a gallery of Project Pythia cookbooks at scale. See the Executable Books blog post for technical details on how we

Centralized configuration
Prototyped a gallery plugin in Python
Fixed a number of bugs related to integrated computation with Binder and JupyterLite
Embraced the referencing and reuse of content with simple markdown syntax for hover-references.

Day 4 #

Looking to the future, we spent time reflecting on our experiences and discussing the potential, transformative impact MyST MD tooling could have in the hands of the scientific community at large, including the communities served by 2i2c. Knowledge-sharing based on static figures and PDFs would fall obsolete and give way to a dynamic, web-first approach to sharing interactive narratives backed by compute from a Jupyter server.

Throughout the course of the hackathon, the rate of iterated development for both end users of the community cookbook and the developers of the open-source tooling was astounding. For example, we were able to quickly expose small bugs ( e.g. support for HTML video tags) in the MyST MD tooling, which were immediately fixed upstream and released within minutes. The feedback loop that connected the user experience with the software tooling was incredibly synergistic, with immediate impact both upstream and downstream that 2i2c hopes to continue replicating across many facets of their operations.

Beyond the Project Pythia Cook-off, the breakout group will continue conversations around strengthening their community of practice and hopefully advocating for wider adoption of MyST MD amongst the scientific community (say hello to some of our group members at SciPy 2024 in July!).

Acknowledgements #

University at Albany (NSF award 2324302): Led the funding acquisition, helped organize and facilitate the event
UCAR (NSF award 2324303): Led the planning and logistics for the event
Project Pythia for organizing this workshop.
Jupyter Book for providing development and collaboration at this workshop.
2i2c / Code for Science and Society (NSF award 2324304): Provided tailored compute services and on-site support
Curvenote: Contributed engineering cycles to MyST MD development.

Jupyter Book 2.0 will use MyST

Tue, 21 May 2024 00:00:00 +0000

See the Executable Books blog for a post on the future directions of the Jupyter Book project, which will be built on top of the MyST Markdown engine.

Acknowledgements #

This post relates to our ongoing collaboration with the Jupyter Book project.
Thanks to Project Pythia for funding part of our work on Jupyter Book.

Security report for jupyter-server-proxy: CVE-2024-28179

Tue, 19 Mar 2024 00:00:00 +0000

What happened? #

A few weeks ago, the JupyterHub team discovered a security vulnerability in the jupyter-server-proxy package that would allow potential unauthenticated access to a JupyterHub via WebSockets, allowing unauthenticated users to run arbitrary code on the JupyterHub. jupyter-server-proxy is used by many communities to provide alternative user interfaces like RStudio and remote desktops.

This vulnerability was detected by the JupyterHub team, with leadership from 2i2c’s engineers. It was resolved through upstream contributions to the JupyterHub project, and we have deployed a fix that mitigates this vulnerability for all the hubs 2i2c manages.

Does this impact my 2i2c community hub? #

We do not believe that any of 2i2c’s communities were impacted by this vulnerability, and a patch has now been pushed to all community hubs to resolve this issue.

If your community was vulnerable to this problem, you might experience slightly slower startup latency while we work out a long-term solution.

Since this is a vulnerability in the docker image used by our communities, we will be reaching out over the next few weeks to put a more permanent fix in place.

Where can I learn more? #

See the JupyterHub security advisory for CVE-2024-28179 for more information about the security vulnerability, including details on the mitigation we have put in place to protect our communities.

Conclusion #

We’re grateful that the JupyterHub community was quick to acknowledge, respond, and resolve this security vulnerability after it was brought to their attention. We’re also proud that 2i2c’s engineers helped the JupyterHub team throughout the process.

This allowed our team to resolve the problem before it impacted any of 2i2c’s communities. Because 2i2c community infrastructure is managed in a central location, we were able to resolve this for over 80 communities with a single team rather than expecting each community to learn about and fix this problem on their own.

We also believe this reflects the healthy upstream relationships that we hope to encourage with our team’s Open Source strategy and practices. By working with the JupyterHub community and pushing changes upstream, we’ve resolved this issue for any user of jupyter-server-proxy, not just 2i2c’s own ecosystem. In particular, because of 2i2c’s position running hubs for many communities via Kubernetes, we were able to identify a solution that did not require every user image to be updated (as described in section For JupyterHub admins of Z2JH installations).

We believe that all of these lead to a healthier, safer ecosystem of open source tools ❤️.

Integrating BinderHub with JupyterHub: Empowering users to manage their own environments

Wed, 03 Jan 2024 16:56:14 -0800

Thanks to Arnim Bleier, Jenny Wong, Georgiana Elena, Damián Avila, Jim Colliander and James Munroe for contributing to this blog post

mybinder.org is a very popular service that allows end users to specify and share the environment (languages, packages, etc) required for their notebooks to run correctly by placing configuration files they are already familiar with (like requirements.txt or environment.yml) along with their notebooks. While not without its own set of challenges, this is extremely powerful because it puts control of the environment in the hands of the people who write the code. They can customize the environment to fit the needs of their code, instead of having to fit their code into the environment that admins have made available.

But, mybinder.org (and the BinderHub software that powers it) is built for sharing your work after you are done with it, not for actively doing work. BinderHubs often do not have persistent storage nor persistent user identity, and UX is centered around ephemeral interactivity that can be shared with others (via a link), rather than persistent interactivity that a single user repeatedly comes back to. JupyterHub is more commonly used for this kinda workflow, but doesn’t currently have the ability for users to easily build their own environments. Admins who are running the JupyterHub can make multiple environments available for users to choose from, but this still puts admins in the critical path for environment customization.

Our collaboration with GESIS, NFDI4DS, and CESSDA, aims to bring this flexibility to JupyterHub directly. We aim to empower users to decide for themselves which applications and dependencies are installed on a per-project basis. Our work enables communities with heterogeneous requirements to share a single Hub. Our approach frees administrators from being overwhelmed by installation requests and transforms the JupyterHub platform into a platform for collaborative computational reproducibility. In this update, we report on our progress and upcoming steps in this project.

What does a BinderHub do, exactly? #

It is helpful to understand that BinderHub primarily has 3 responsibilities:

Present a UI to the end user for them to provide details on what to build (this is what you see when you go to mybinder.org)
Call out to repo2docker in a scalable way to actually build and push an image containing the environment for the given repository, and show the user logs as this build process happens. This also allows users to debug issues with their build more easily.
Talk to a JupyterHub instance to launch a user server with the built docker image, and redirect the user to this.

(2) is really the core feature of BinderHub, and we settled on figuring out how to make that available to JupyterHub users. It was really important to us that this was also done in a way that can be sustainably used by everyone, not just 2i2c. This blog post discusses the various improvements to the broad ecosystem of projects in the Jupyter ecosystem to get this done.

Demo #

But first, a very quick demo of how this looks like right now now!

This is very much a work in progress, but the basic flow can be seen clearly. Users see a Server Options menu after they log into JupyterHub. They can specify the two primary things that determine the server configuration:

The resources allocated (RAM, CPU and maybe GPU)
The environment (container image) used, which can be specified in one of 3 ways:

a. A pre-selected list of environments (container images), provided by the administrators who set up this JupyterHub b. A blank text box where you can enter any publicly available docker image they want c. A mybinder.org style way to specify a GitHub repository, which will be then dynamically built into a docker image for the user!

So what did we need to do to accomplish this, in a way that’s very upstream friendly and usable by everyone (and not just 2i2c)?

A Standalone `binderhub-service` helm chart #

The default upstream BinderHub helm chart includes a JupyterHub as a dependency, and configures itself to be used primarily in a manner similar to mybinder.org. As the person who helped make that choice early on, I can tell you why it was made - for convenience! And it was very convenient, as it allowed us to get mybinder.org going fast. However, it makes it difficult to install a BinderHub service alongside an existing JupyterHub. To this end, we have created a standalone BinderHub helm chart, designed to be installed alongside an existing JupyterHub, so we can use it purely to build images. This allows the BinderHub instance to be used as a JupyterHub Service, which is what we want.

While this helm chart is currently under the 2i2c GitHub org, the hope is that it can eventually migrate to a jupyterhub-contrib organization (once it is created), or it can become the upstream helm chart for BinderHub if enough work can be done in BinderHub to allow it to serve use cases like mybinder.org.

As part of this work, we also added a way for BinderHub to run in API only mode, so we can fully turn off the UI and launching ability of BinderHub. This change decoupled the three responsibilities of BinderHub we discussed previously, allowing us to bring our own UI and JupyterHub. BinderHub could now be used purely for its scalable image building features, which is exactly what we want!

Sustainably extending KubeSpawner’s `profileList` #

We identified KubeSpawner’s profileList feature as the ideal location for UI to dynamically build environments (container images), making it just another ’environment choice’ people can choose, along with picking the resources their server needs. From an end-user perspective, it was also the logical place for them to specify a repository to build into an environment, as they could already choose some pre-built environments from here. They can also select other arbitrary resources they want (such as memory, GPU, etc) from here as well. From a maintainer perspective, it helps with long-term maintenance of the JupyterHub projects.

The implementation of profileList however, was not easy to extend at this point. So this PR improved how easy it was to extend it in more complex ways, without making the implementation in KubeSpawner itself complicated. Even though this had no visible end-user effects, it was an extremely important step in allowing us to experiment with UI in a sustainable way without having to rely on upstream. These kinds of changes can sometimes be hard to sell to stakeholders but are extremely important in ensuring a continuous and sustainable relationship with upstream.

Implementing `unlisted_choice` feature in KubeSpawner #

The profileList feature was built to allow JupyterHub admins to specify an explicit list of container images the end-user can choose from. It did not have a way for any choice that was not pre-approved by the admin to be used. We needed this feature since the BinderHub API will build a new docker image for each environment the user wants, and so this can not be chosen from a pre-approved list. We had to safely add this feature to KubeSpawner in such a way that it was generally useful to everyone. Many other communities had been asking for such a feature anyway - the ability to simply ’type in’ an image and have that be used.

NASA VEDA was one such community, so we partnered with Sanjay Bhangar from Development Seed (an organization that helps run NASA VEDA) to implement this feature. Engineers from 2i2c contributed heavily to this feature as well, and after several PRs ( 1, 2, 3, 4 and 5), this feature is now available for everyone to use!

A key component of doing sustainable upstream work is that every addition needs to be useful by itself for a broad group of people. This change was very helpful for many communities that wanted to allow their users the freedom to pick whatever image they want to use, regardless of wether they wanted to use dynamic image building or not. The broad interest allowed us to build a coalition with other interested parties, and get the change accepted upstream more easily!

`jupyterhub-fancy-profiles` #

Once we had all these pieces in place, it was time to actually work on the frontend UI that would allow users to build images dynamically and launch them. Since this will replace the ‘profileList’ feature, it should also allow them to select different resources (RAM, CPU, etc) as needed, as well as type in an existing image if they desire. So it was a full re-implementation of the profileList frontend.

This is ongoing now at the jupyterhub-fancy-profiles project. It is a pure frontend web application, using modern frontend tooling ( React, webpack, Babel, etc) and written in JavaScript. It’s gone through a few revisions, but the demo provided earlier in the blog post is in its current state. Because the default profileList implementation is pure HTML / CSS with very minimal JS, it is limited in what kind of UX it could have. jupyterhub-fancy-profiles aims to be very helpful even when dynamic image-building features are not enabled on a JupyterHub. We hope to roll this out to a few JupyterHubs and improve it over time based on feedback.

`jupyterhub/@binderhub-client` npm package #

While building jupyterhub-fancy-profiles, we wanted to use the same javascript code used by BinderHub frontend to interact with the BinderHub API, instead of re-implementing it. However, the existing BinderHub JavaScript code was not easily consumable by external projects. We refactored the code, added tests, migrated to use modern JS practices and published the jupyterhub/@binderhub-client NPM package that can be used not just by jupyerhub-fancy-profiles but any external project for talking to the BinderHub API.

This had to be done in such a way that current BinderHub installations (such as mybinder.org) do not break. That took quite a few pull requests: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. This refactoring work was very helpful to us, and also appreciated by the broader community.

Defending against cryptojacking with `cryptnono` #

For Open Science to flourish, we need to allow access to resources without login / paywalls wherever possible. A new menace against this has been cryptojacking - where attackers use up any and all available free compute to mine cryptocurrencies. This has affected many folks on the internet, including GitHub Actions and mybinder.org, the primary public BinderHub installation. mybinder.org has some extra protections against cryptojacking that aren’t easily usable elsewhere, and this has unfortunately meant that the demo JupyterHubs we have with these features enabled have been behind a login wall. I personally believe login walls are long term antithetical to open science, and so this was an important problem to solve.

cryptnono is an open source project designed to help fight cryptojacking, and as part of this grant we ported some of this functionality out of mybinder.org specific code into cryptnono, so other deployments may also benefit from it! We also migrated to using the super efficient ebpf Linux Kernel subsystem, allowing for more complex heuristics to catch a much broader range of cryptomining activity. We have been slowly tweaking the config on mybinder.org, and it has proven to be very effective! This will be very helpful for anyone who wants to provide a JupyterHub (or any other computational service) without a login wall. If you are interested in using cryptnono in this fashion, please reach out to us so we can work together!

Explored pathways that were then discarded #

List of things that were tried and then decided as not good pathways:

repo2docker-service, a separate JupyterHub service that could only build images. As we worked on it, we realized that it was replicating a lot of features that BinderHub already has, so we pivoted to working on BinderHub directly instead.
Building off of tljh-repo2docker. While this already had a nice UI, it would be hard to port it to run on a distributed Kubernetes environment without it becoming a ‘hard fork’.

While these did slow down the implementation of the project, it has allowed us to be very confident that the methods we have chosen are long-term sustainable.

Want to try this out? #

We have a demo of this running at imagebuilding-demo.2i2c.cloud, but unfortunately as we are still fine-tuning cryptnono config, at this moment it is not open to the public. Please contact me with your GitHub account if you want access, and promise to not be a cryptominer and you shall be granted access.

Want to set this up on your own JupyterHub? There is some work in progress documentation and more is being worked on. Drop a line in the linked pull request and we’ll be happy to help. The eventual goal is for anyone to be able to simply follow documentation and set this up for themselves.

We also have user facing documentation on using this service on docs.2i2c.org.

Future work #

This is not complete of course, and there is a lot of future work to be done.

mybinder.org also helps you distribute your content, not just the environment for your code to run in. Since JupyterHub usually comes with a persistent home directory for the user, nbgitpuller is commonly used for this purpose instead. We should explore ways to integrate nbgitpuller (and other ways to distribute content) in the future.
More thorough documentation for how you can recreate what is in the demo for yourself in your own JupyterHub installation.
Better UX for specifying images, including figuring out how to ‘save’ them for future reuse.
Better compatibility with mybinder.org, particularly in allowing other sources of environments (not just GitHub, but Zenodo, raw git repositories, etc) and URL compatibility.
Better authentication workflow between the frontend and the BinderHub API.

Credit #

All this work would not be possible without a large group of collaborators!

From 2i2c: Erik Sundell, Georgiana Elena, Yuvi, James Munroe, and Damián Avila.
The persistent BinderHub project was the direct inspiration for all this work, with particular thanks to Kenan Erdogan.
The tljh-repo2docker project, which explores similar ideas in the context of running only on a single node.
The broad JupyterHub and MyBinder.org community, particularly Simon Li and MinRK.
Funding generously provided by GESIS in cooperation with NFDI4DS (project number: 460234259) and CESSDA.
Arnim Bleier from GESIS was instrumental in making this project happen.

2i2c supports Jupyter Docker Stacks ARM builds

Fri, 01 Dec 2023 00:00:00 +0000

The Jupyter Docker Stacks project provides a collection of ready-to-use Docker images for Jupyter environments. These images are used by many in the Jupyter community, including 2i2c which uses them as base images for our JupyterHub deployments.

The project recently began publishing ARM-compatible images alongside the standard x86 images, making it easier for users with ARM-based systems (like M1 Macs) to use these environments. However, building and hosting these ARM images comes with additional cloud computing costs that were being personally covered by @mathbunnyru, one of the project’s maintainers.

A part of 2i2c’s mission is supporting upstream communities that we rely on, especially where the upstream project has limited resources. For this reason, we’ve decided to support Jupyter Docker Stack’s ARM building costs, with a total budget of $2000 (approximately $150 per month). As a regular user and beneficiary of the Jupyter Docker Stacks, we believe it’s important to contribute to the maintenance and sustainability of this crucial piece of infrastructure that benefits the entire Jupyter community.

We hope this support helps the Docker Stacks project remain healthy, and continue providing high-quality, multi-architecture images that work across different computing platforms. We’ll revisit this decision as the landscape of technology providers changes and other options arise.

Acknowledgments #

Thanks to Project Jupyter (particularly the jupyter-stacks team) for this project.

A QGIS desktop in the cloud with JupyterHub

Sat, 05 Aug 2023 00:00:00 +0000

The QGreenland Researcher Workshop

JupyterHub is a versatile platform that can serve a desktop with Geospatial Information Systems (GIS) software in the cloud. This was demonstrated by the QGreenland Researcher Workshop that was hosted by the NASA CryoCloud hub. The hands-on workshop trained 25-30 researchers, from Germany, India, France, Canada, Poland and the United States, on how to work with geospatial data in an open science framework.

QGreenland Overview #

QGreenland is an open-source geospatial data package designed for QGIS, a community-owned GIS platform. It focuses on Greenland, offering researchers and educators a comprehensive toolset for FAIR (findable, accessible, interoperable and reproducible) data analysis. The package integrates a variety of datasets into a single, easy-to-use data-viewing and analysis platform, supporting both offline and online use. This makes it particularly valuable for remote fieldwork and areas with limited internet access.

Workshop Success #

The QGreenland workshop demonstrated several key benefits of using JupyterHub for cloud-based GIS:

Accessibility: Participants from across the world could access the same powerful GIS tools through a web browser, eliminating the need for complex local installations while enhancing reproducibility
Cloud block storage: Using a JupyterHub in the cloud allowed for faster data access than a traditional NFS file store by provisioning each user with an elastic block store disk, reducing load times from 5 minutes to under 3 seconds.
Cost Efficiency: Utilizing the CryoCloud JupyterHub instance managed by 2i2c drastically cut down setup costs and time, with only minimal cloud operating expenses of roughly $1/person/day.

Conclusion #

The success of the QGreenland workshop underscores the potential of integrating interactive software applications in JupyterHub. This approach not only democratizes access to advanced geospatial tools but also fosters a collaborative research environment. We look forward to supporting more workshops for QGreenland in the future!

Want to know more? Check out the companion post by QGreenland on the Jupyter Blog

Acknowledgements #

Trey Stafford (CIRES)
Matthew Fisher (CIRES)
*Fisher, M., *T. Stafford, T. Moon, and A. Thurber (2023). QGreenland (v3) [software], National Snow and Ice Data Center.
Snow, Tasha, Millstein, Joanna, Scheick, Jessica, Sauthoff, Wilson, Leong, Wei Ji, Colliander, James, Pérez, Fernando, James Munroe, Felikson, Denis, Sutterley, Tyler, & Siegfried, Matthew. (2023). CryoCloud JupyterBook (2023.01.26). Zenodo. 10.5281/zenodo.7576602

* Denotes co-equal lead authorship

On the Jupyter Blog: From intern to mentor.

Fri, 30 Jun 2023 00:00:00 +0000

6——1———————– #

CILogon usage at 2i2c

Fri, 24 Feb 2023 00:00:00 +0000

About CILogon #

CILogon is an open source service provider that allows users to log in against over 4000 various identity providers, including campus identity providers. The available identity providers are members of InCommon, a federation of universities and other organizations that provide single sign-on access to various resources.

CILogon and 2i2c #

For the past year, 2i2c has been successfully using CILogon for more than fifteen of the hubs it manages.

Currently, most of the hubs that use it are hubs for communities in education that want to manage their hub access through their own institutional providers.

With using a tool like CILogon, we allow hub access to be managed both through the communities’ institutional providers, but also through social providers like GitHub and Google. Because both authentication mechanisms can coexist, there’s no need to provide specific credentials for 2i2c staff in order to have access to the hub. This reduces both the burden on institution’s IT departments, but also the complexity of a hub deployment.

Moreover, as we migrate away from our current Auth0 setup, the number of hubs using CILogon will further increase in the following year.

The setup #

The setup that 2i2c uses, is based on two important tools, the CILogon administrative client and the JupyterHub CILogonOAuthenticator.

The CILogon administrative client #

The 2i2c administrative client provided by CILogon allowed us to automatically manage the CILogon OAuth applications needed for authenticating into the hub.

For each hub that uses CILogon, we dynamically create an OAuth client application in CILogon and store the credentials safely, using the script at cilogon_app.py. The script can also used for updating the callback URLs of an existing OAuth application, deleting a CILogon OAuth application when a hub is removed or changes authentication methods, getting details about an existing OAuth application, getting all existing 2i2c CILogon OAuth applications.

The JupyterHub CILogonOAuthenticator #

For CILogon’s integration with JupyterHub’s authentication workflow, we’re using the CILogonOAuthenticator, which is part of the JupyterHub OAuthenticator project. This is what allows JupyterHub to use common OAuth providers for authentication, and it’s also a base for writing other Authenticators with any OAuth 2.0 provider.

As part of this 2i2c integration with the JupyterHub CILogonOAuthenticator some important upstream fixes and enhancements to the oauthenticator were identified and performed. For example, the GHSA-r7v4-jwx9-wx43 vulnerability was reported and fixed, and a migration guide containing a description of the breaking changes that were made, together with a step by step guide for the users on how to update their usage of JupyterHub CILogonOAuthenticator was provided.

Read more about how CILogon is setup for use at 2i2c from the docs.

Celebration #

Thanks to the 2i2c - CILogon partnership, during this past year we were able to integrate CILogon into 2i2c’s infrastructure and to observe its importance, usefulness and great support for 2i2c and the communities we server.

We are now happy to announce that the 2i2c - CILogon partnership has been expanded to another year!

Acknowledgements: The upstream jupyterhub-oauthenticator project mentioned in this post as being used at 2i2c is a JupyterHub package, kindly developed and maintained by the JupyterHub community and the 2i2c integration described was developed by the 2i2c engineering team. Also, this post was edited by Jim Basney.

GESIS - 2i2c collaborate to build a persistent BinderHub experience

Mon, 28 Nov 2022 00:00:00 +0000

Introduction #

Mybinder.org enables researchers across the world to replicate computational environments in the cloud. It allows researchers to turn static code into interactive literate coding environments with a click of a button within seconds. The mybinder.org service is powered by BinderHub, an open-source tool developed by the Jupyter Project that many organizations have deployed for their own communities. It does this by _dynamically building _the software environment needed to reproduce a computation (using a tool called repo2docker), and making this environment available to users.

BinderHub was developed for use-cases that are temporary and fully open by design. BinderHub sessions are destroyed after a fixed amount of time and there is no persistent storage or authentication. However, many research institutions also need more “standard” service features like authentication and persistent storage.

Over the past several years, the GESIS Notebooks team made the first steps towards bridging this gap through their Persistent BinderHub implementation. This was a modified and authenticated BinderHub that included persistent storage across sessions. The Persistent BinderHub service was very successful at GESIS and with its partner communities, and the team wishes to build this functionality into the JupyterHub community’s core technology so that these features can be enjoyed for more use-cases and by many communities.

To enable this vision, we have partnered with GESIS in cooperation with NFDI4DS (GAN: 460234259), CESSDA, and members of the JMTE project. This collaboration has three primary goals:

Generalize the Persistent BinderHub functionality/experience to run on cloud-agnostic infrastructure, so that other stakeholders in NFDI, CESSDA, and the broader scientific community may benefit from this functionality and experience.
Upstream this functionality by making contributions into Jupyter community projects, so that it will be maintained and improved by a community moving forward, thus improving its reliability and sustainability.
Improve the implementation and user experience around Persistent BinderHub, in order to make it more reliable, scalable, productive, and enjoyable to use.

We began this collaboration several months ago, and have focused our efforts on exploring potential implementation pathways for this functionality. We believe that we now have a path forward for this functionality, and this blog post is a brief report of our efforts and future plans as we undertake this effort. See this GitHub Projects Board for issues that implement this effort.

Exploration 1: Adding persistent storage directly into BinderHub #

Our initial intention was to incorporate persistent storage and authentication from the GESIS Persistent BinderHub into the BinderHub codebase. We began by holding a series of meetings to discuss technical requirements from our experience in the JupyterHub/BinderHub ecosystem, and also conducted an audit of the Persistent BinderHub codebase. The Persistent BinderHub implementation is a modified Helm Chart that configures a JupyterHub to expose its authentication and persistent storage functionality, overriding the BinderHub default behavior. We were concerned that building this functionality natively into BinderHub would be challenging given that the BinderHub codebase was designed for ephemeral user sessions.

So, we decided to take another approach:

Exploration 2: Add dynamic image building to JupyterHub #

We realized that there is a way to make this functionality more broadly useful and more maintainable, while still achieving the end-user experience that the GESIS team needed. Instead of modifying BinderHub to incorporate JupyterHub’s storage and authentication features, we would give JupyterHub the ability do dynamically generate user environments using repo2docker.

This would give JupyterHub users more flexibility over the environments served by their hub, and expose Binder-style workflows to the “typical” JupyterHub workflow. BinderHub could then be simplified to re-use JupyterHub’s image building functionality as a part of its own service. We also identified a prototype of this functionality in the tljh-repo2docker project that QuantStack had built for the PlasmaBio project. This implementation was seen as successful, and something others in the community had wanted to generalize for some time.

Our implementation plan #

Two phases of implementation #

With this alternative implementation route in place, we identified two major steps to accomplish this project:

Build a back-end for dynamic environment building. JupyterHub needs to understand how to call repo2docker’s image generation from a Docker-based environment. It needs to expose this ability via APIs that others can build interfaces on top of.
**Build a front-end that is user-friendly and accessible. **Once the back-end is functional, we must build a front-end experience that feels familiar to BinderHub users and is easy and intuitive to use.

Here are a few tasks that we’re carrying out next to make progress on the above two items.

Build a working prototype for image generation via a JupyterHub Service (see below for current status)
Research the tljh-repo2docker code base to understand how we could build upon its UX and functionality.
Understand the typical process that GESIS and NFDI users follow in their BinderHub workflows, to ensure that it can be replicated via this new implementation.
Perform UI/UX research validation to inform the implementation from a user’s perspective.

As a follow-up, we’ll likely re-work the BinderHub codebase to utilize JupyterHub’s new repo2docker service, rather than defining its own custom repo2docker functionality.

Back-end implementation as a JupyterHub Service #

We’re planning to use JupyterHub Services to add the back-end functionality for dynamic image building to a JupyterHub. Services are a way to expose functionality via a JupyterHub REST API, and would allow us to expose basic image generation on-the-fly with repo2docker. We aim for basic functionality to be as close as possible to repo2docker’s default behavior, but to make this functionality _composable and customizable _if a JupyterHub administrator wants to provide different out-of-the-box functionality.

While details may still change, we believe that the major architectural plan has now been settled. We have an experimental codebase with a basic implementation of the repo2docker service described above. As part of this effort, this team has also made a number of improvements to repo2docker’s codebase and project infrastructure. We hope that this effort will continue to push resources and attention to repo2docker by growing the number of users and stakeholders that rely on the project for their success.

Front-end implementation that uses this service #

Once the back-end setup exists, we can use it to begin prototyping user interactions that can trigger and use repo2docker’s image generation. Bringing dynamic environment image building into JupyterHub is a significant improvement in its functionality, and may introduce new kinds of workflows that we hadn’t initially imagined. Understanding, interpreting, and extending the original “Persistent BinderHub” workflow will require a more thorough understanding of user stories and needs in order to identify new workflows that feel natural not only in a JupyterHub scenario but also in a Binder-like scenario.

For example, here are a few major UI/UX questions we must answer:

When a user builds an image, should it also become available to other users as well?
How can a user store, find, and delete old images that they’ve built?
What about new versions of the same image?
Should we simply mimic the mybinder.org UX, or should this be improved as well?

We must answer these and several other questions next. With that vision in place, we’d like to expand our efforts in UI/UX and user research. This will help inform the technical implementation of this work as we expand on our prototype, and help us choose the right way to expose this functionality to users. We welcome collaboration around this work - if you know of an organization that is interested in collaboration, please reach out.

Collaborate and follow along #

Below you’ll find a rough project plan to give an idea for the major actions needed and a timeline for when we hope they’ll be completed. We’ll track further updates and progress on this project in this dedicated GitHub issue and this dedicated GitHub project board.

After a few months working on this project, we are even more excited about the potential for dynamically building environments in a JupyterHub. We believe that it adds a new class of workflows to JupyterHubs that were not possible before, and will be immediately useful to the hundreds of communities that deploy JupyterHub for their communities.

The why, what, and how of our NASA Openscapes cloud infrastructure: 2i2c JupyterHub and corn environment

Thu, 17 Nov 2022 00:00:00 +0000

We recently shared a demo of our infrastructure stack with the OpenScapes community. Check out the blog post about it here.

Enabling / creating / X outcome by doing Y thing

Wed, 01 Jan 1000 00:00:00 +0000

One or two sentences about what happened and who did it. Link to the appropriate people, orgs, etc!

One or two sentences describing why this is important, why it’s valuable, or what it means.

Embed any images, videos, or youtube videos that are relevant. Put images in a file in the same folder called featured.png so it shows up as a feature preview.

Learn more #

Bulleted list of URLs where readers can learn more or follow along.

Acknowledgements #

Bulleted list of people and organizations to thank, with links to their spaces.

EXTRA EXAMPLES #

Service post: blog/2025/status-page/index.md
Impact post: blog/2025/hackweek-shoutout/index.md

TIL: How to do XYZ thing for Y outcome

Wed, 01 Jan 1000 00:00:00 +0000

One or two sentences setting context about a common problem.

One to many sections describing something we’ve tried and what we’ve learned in solving that problem.

Embed any images, videos, or youtube videos that are relevant. Put images in a file in the same folder called featured.png so it shows up as a feature preview.

Learn more #

Bulleted list of URLs where readers can learn more or follow along.

Acknowledgements #

Bulleted list of people and organizations to thank, with links to their spaces.

EXTRA EXAMPLES #

Here’s a blog post with an example of what we’re looking for: contentblog/2025/github-action-secrets-forked-repositories/index.mdindex.md

Open Source | 2i2c

Protecting our hubs against the CopyFail kernel exploit

Are 2i2c’s hubs at risk? #

Why do we think we’re not at risk? #

What else did we look into #

Acknowledgements #

Supporting JupyterHub admins on workshop hubs with shared passwords

Acknowledgements #

Report from the Jupyter and AI community meetup

Jupyter AI and the Agent Client Protocol #

Notebook CLI #

Sidebar comments and Real-time collaboration #

Opportunities for 2i2c #

Acknowledgements #

BIDS joins the mybinder.org federation with help from 2i2c

Report from the Jupyter Security Working Group security tooling sprint

Acknowledgements #

Upgrading community infrastructure to Kubernetes 1.34 and JupyterHub 4.3.3

A new approach to infrastructure upgrades: upgrading in rounds #

Learn more #

Acknowledgements #

Combining multiple repos into one site at jupyterbook.org

Acknowledgements #

How regularly upgrading core infrastructure leads to upstream improvements and better infrastructure

Acknowledgements #

Better sharing UX with nbgitpuller and contextual error handling

TL;DR #

What is nbgitpuller? #

How does nbgitpuller work? #

When it goes wrong #

Merge conflicts #

Error UX (old) #

Error UX (new) #

Learn more #

Acknowledgements #

Jenny Wong joins the JupyterHub team

Learn more #

Acknowledgements #

Introducing Jupyter Book 2 at FOSDEM 2026

Learn more #

Acknowledgements #

New Jupyter Book / MyST stack release (Jan 2026)

Where we contributed #

Acknowledgements #

STRUDEL enables rapid scientific GUI prototyping in partnership with 2i2c

What happened #

Why we’re excited about this #

Links to learn more #

Acknowledgements #

April joins the Jupyter Community Building Working Group

Acknowledgements #

NASA Open Science ScienceCore tutorial available at github.com/sciencecore

Acknowledgements #

Faster reporting of user home directory sizes

Using jupyterhub-home-nfs for near-instant disk usage metrics #

Try it out #

Coming next #

Acknowledgements #

Supporting NASA Openscapes Champions with Cloud Infrastructure

Adding User Group Insights to Cloud Cost Dashboards with Grafana

Learn more #

Acknowledgements #

2i2c at JupyterCon 2025: Helping communities navigate the Interactive Computing ecosystem

Building computational narratives with Jupyter Book #

JupyterHub’s evolution and sustainable operations #

Understanding the JupyterHub community #

Pythia sharing their MyST journey #

How CryoCloud built a healthy open science community #

Yuvi on scaling maintainer intuition to facilitate PR review with PR triage boards

Acknowledgements #

Creating a re-usable redirect generator for Jupyter Book 1 migrations

Learn more #

Community learning: Hub config to pass oauth tokens into user environments

Learn more #

Acknowledgments #

Refactoring Jupyter Book 2 documentation ahead of a major release

What we did #

Why we’re excited about it #

Learn more #

Acknowledgements #

Using `jupyterhub-home-nfs` for near-instant disk usage metrics #

The `pull_request_target` workflow #