OpenSource

Stone Soup Software Development

How a classic child's tale can influence software development in a positive direction.

Chris J. Karr

04 Mar 2025 — 8 min read

"Cooking Soup on Campfire" by Vladimir Srajber

For over a decade and a half, I ran Audacious Software, a single-person software company, catering primarily to a clientele consisting of research universities using method in areas I that had the good fortune to be present from the start: passive sensing on mobile phones, SMS-delivered surveys and interventions, social media research, and more.

As a contractor for my university clients, the standard legal agreements (provided by the institutions) stipulated that the software I was writing was a work-for-hire and that the work I produced belonged to them to do with as they pleased. This isn’t unusual, and makes sense for a lot of bespoke custom software development that universities outsource to folks like me.

Unfortunately, that created a big problem. Since a lot of my work came through word-of-mouth referrals – another researcher in another institution seeking to replicate or tweak something I did for a client – I had to find a way to support the new prospect without stepping on the intellectual property rights of my original customer.

If you talk to a traditional IP lawyer, you basically have two options to deal with this:

Find a way to license the software from the original customer to the new client’s institution.
Find a way to rebuild from scratch a new system that functions pretty much identically to the original system (for the purposes of scientific replication), but doesn’t do so in a manner that would violate the original client’s copyright.

I’ve done both in my career.

As an example of the first case, we can take the HealthySMS system that I built for Adrian Aguilera for his work supplementing traditional group-based therapies with additional content and messaging throughout the week. The University of California, Berkeley owns HealthySMS, and researchers that adopt that system pay a standard license fee to use it.

The major issue with this is that we end up with two teams of university lawyers that have to negotiate a license among themselves, which dramatically slows things down and raises costs for simple research projects. We also run into extended discussions of who owns any new changes to the system (e.g. adding WhatsApp support) introduced by new research projects, and all the additional complexity and negotiations that can entail.

For the second case, I’ve also created entirely new text-messaging systems for clients without a budget to license HealthySMS, or in cases where it would take less resources to build something new than to contort the original platform into a form that it was never really designed to accommodate. I’ve built enough texting systems in my day that it’s pretty easy to whip together a custom Python script that achieves new research objectives. (I’ve been doing this since 2006 – see my chapter with Eszter Hargittai in 2006’s Research Confidential.)

The problem with this second approach is that it results in supporting a myriad of very similar, but subtly different systems. It becomes very easy to lose track of which system supports which features, and when the inevitable requests for extensions and changes trickle in, we end up with a fractal growth in complexity, which is the bane of all maintainable software systems.

Around 2020, having lived through the downsides of both of these approaches for several years, I decided to try a third option, inspired by the children’s fable of the Stone Soup.

For those that don't recall that particular tale, a traveler with an empty pot arrives in a village and with no money to his name, sits down and starts a fire and begins heating water in his pot. The villagers watch him with curiosity, and he tosses a stone into the pot, and a child asks him what he’s doing. The traveler replies:

“Why, I’m making a stew, of course.”

The traveler dips a ladle into the hot water, and tastes it.

“Not bad, but this could use a little bit of carrot.”

A resident of the village rushes home and brings him back a carrot. The traveler chops up the carrot and tosses the slices into the water. After it’s cooked a while, he gives a bowl of the “stew” to the villager as thanks. He tastes it again, and proclaims that it’s much better, but what would make it truly excellent would be some potato. Another villager rushes home to bring the traveler a potato. Into the “stew” it goes. After it cooks a while longer, he ladles a spoonful into the bowl of the villager who gifted him the carrot and serves a bowl to the one who contributed the potato.

This cycle continues many times, until everyone in the village has contributed an ingredient to the stew, and the entire town is feasting on a delicious stew for dinner that evening.

This simple children’s tale of cooperation and sharing inspired me to do the same in software.

So, how does that work?

Once you look past the legal language and the contracts terms, fundamentally, my research clients demanded a work-for-hire relationship where they owned everything for two major reasons:

As a way to hedge against the risk that I would disappear or change the terms of what I was offering. They wanted to be able to continue using and adapting the software I had built, even if I fell off a sailboat and drowned, or if someone bought me out and started charging usurious prices for what I had created.
Under the off-chance that the system I was building proved to be useful, the institutions wanted the ability to commercially exploit what I had built, without having to ask me for permission.

(A third reason to demand a work-for-hire contract is when the customer may have an interest in denying someone else the ability to reproduce what I was being paid to build, but that factor generally isn’t at play in research institutions after the customer has staked their claim by publishing their work about the new system.)

Given universities’ reasons for wanting to own what I was building, I quickly arrived at a third way to make everyone happy: I would retain ownership of some parts of the system – typically the elements that didn’t have any research value or novelty on their own – and I would license those elements back to the client under an Apache 2.0 license, in which they retained their freedom to exploit or move forward with the system without me. I had already been licensing Passive Data Kit to clients under similar terms – and I found that as long as I informed the customer why I was including language about “preexisting software components” into our contracts and statements of work, they generally caught on pretty quickly, and I encountered much less resistance to this approach than I originally expected.

From a macro-view, customers still owned the aggregate product that was their research software system. However, I carved out specific ownership rights to underlying components, but licensed them to the client under commercial-friendly terms. In the age of open-source software, this is a more common arrangement than not. Just look at your smartphone. It's brimming with open source, no matter how hard the manufacturer may try to hide it.

As we pull together something we’re calling the Automated Conversational Kit at BRIC, under the hood, it’s a collection of a variety of libraries and components that I’ve developed and maintained while working for a global set of universities. The first element was the Django Dialog Engine, a Python library built for authoring automated conversation scripts, instead of creating one-off bespoke Python programs. Simple Messaging and its constellation of related libraries (Switchboard, Twilio, Azure, etc.) is me abstracting all the common elements I was building for text messaging projects. I pulled out Simple Data Export from Passive Data Kit, because I was getting tired reinventing reporting engines. Simple Backup arose out of the need to sleep at night, knowing that I could recover a system if a server failed in a predictable and reliable manner.

These days, my software development work consists less of taking a requirements document from a researcher and figuring out how to turn that into a study app, and more reviewing those requirements and figuring out how they map onto existing reusable infrastructure in our (and others') toolbox, and whether it makes sense to create new infrastructure that will likely be requested in the future, or create a bespoke one-off that the research customer owns entirely.

A skeptical reader schooled in traditional intellectual property concepts may have read the last couple of paragraphs aghast. I’m creating and curating a collection of libraries and tools on clients’ dimes and they aren’t reaping the full benefits the work they’re paying me to do?!?

My response is to point out what the client does gain under this approach:

Projects begin much sooner, development time is shorter, and the engagement ends up costing much less, because I’m not being repaid to re-implement systems from scratch. I’m not billing clients 20 hours to get a basic system up to send and receive text messages from Twilio. They’re getting billed 1 hour for me to incorporate and configure an existing component, leaving those 19 extra hours to focus on the parts of their system that are truly novel and scientifically interesting.
As I maintain the components and address issues that inevitably emerge through real-world deployments, I’m identifying and addressing unexpected edge cases and bugs, and those fixes and enhancements flow to everyone using these components, not just the immediate client project where I encountered the issue.
A common shared foundation of interoperable components aids in replicability. This has been important as we’ve been integrating generative AI support into text messaging projects. We’ve created a common LLM framework that plays nicely with Django Dialog Engine and the Simple Messaging libraries, so researchers can share their configurations to enable others to easily replicate instead of trying to map independently developed systems to each other.
It allows us to spend time implementing things The Right Way once, and not having to impose big compliance costs onto clients. By this, I mean taking the time to follow best practices outside the context of a client engagement, implementing important functions like auditing and robust logging that often go unrequested, and fulfilling regulatory compliance requirements under laws like the GDPR and HIPAA that never make it into the project requirements.

While I’m calling this “Stone Soup Software Development”, in reality, this is just the traditional open-source model, but instead of us being a simple middleman for connecting others’ open source (such as a database or web server) to study requirements, we’re also building our own open source in contexts that reflects the needs of researchers and their institutions.

By taking on this responsibility ourselves, it also empowers us to not only build for today’s research requirements, but to have a foundation in place for tomorrow’s studies. It allows us to extrapolate a bit into the future and help to our clients manage constantly shifting technology platforms and paradigms. It allows us to intelligently exploit the broad perspective that we enjoy now working with so many investigators to identify areas that need attention broadly, and to identify potential opportunities for unlocking new investigative opportunities.

One of the primary reasons that Ericka and I founded BRIC was that we both recognized the importance of thinking intentionally about infrastructure that isn't concerned with developing the best infrastructure for a single project, but taking a wider perspective for building the best infrastructure for investigators in general. This "Stone Soup" approach that served me and my clients so well during the Audacious Software years is the default perspective we'll be using at BRIC. Instead of putting the weight on the question of Why we should open source infrastructure, the taller hill to overcome will be Why Not share the work we're doing internally and on behalf the public we're serving as a non-profit entity.

And it's not just the "Stone Soup" attitude that's transferring from Audacious Software to BRIC - I'm pleased to announce that the open source components and infrastructure that's served me and Audacious Software for so long will continue to be developed, maintained, and extended under the BRIC umbrella.* This includes Passive Data Kit, Automated Conversational Kit, and a variety of supporting libraries, frameworks and tools. Under BRIC's stewardship, we'll be seeking opportunities to work with similarly open and future-focused clients and funders to contribute to this delicious stew, and to share it with others with their own flavors to add.

(* Just be patient with us as we update GitHub repositories, copyright notices, and all of that!)

Stone Soup Software Development

Chris J. Karr

Read more

Where does BRIC fit?

Leveling-Up the Newsletter

Containerized Science

Announcing ICBRI, the first Interstellar Conference on Behavioral Research Infrastructure