Saturday, December 13, 2025

Why I am a Platonist and you should be too

Plato thought that there were things that we could learn about the world.  He thought that if we had enough time and patience and the right collaborators, everything would eventually become clear.  His faith was less ontological and more social than the caricature that we call "Platonism" today.  He used myths and metaphors as a backdrop to the main event of his philosophy, which is dialectic, an essentially social activity.   His faith was simply that there were things to know and that we could learn together by talking to each other.

The static, absolutist mindset traditionally associated with Platonism doesn't follow from this faith.  Kant's argument that what we can learn is conditioned by us as much as it is about the world that we inhabit is completely consistent with Plato's faith.  What we need is dialectic, not mysticism.  We need analysis and self-criticism, not a fixed view of the world.

What worries me now is that while those on the Norvig side of the Norvig-Chomsky debate call their opponents, "Platonists" it is in fact they who lead us down the static, mindless path.   

When the Internet was first really taking off,  I had a friend who was incredible at finding and sharing via email an amazingly curated set of links and messages.  He knew what kinds of things interested me and the other recipients of his blasts.  We talked about things.  My initial reactions were often negative and sometimes I felt like I was wasting time looking at his "feed" (word did not exist back then), but equally often I found that what initially just annoyed me ended up leading me into discussions and ideas that changed my way of thinking.

Now we are subject to algorithms that look at where we live, what we click on and a random set of demographic and / or personal financial data about us.  Nothing that knows us in any real sense.  Still, the algorithm will select stuff that is better than a random selection of articles for you to look at.

In both cases, you get a feed.  In the first case, the feed is part of a conversation with an intelligent narrative underneath it.  The second is a black box.  As you provide more data, the algorithm will increase its likelihood of giving you things that you like.

Now think about what happens over time in these two scenarios.  In the ongoing human dialog, your personal growth is supported by the conversation and the conversation is enriched as a result.  You get challenged by new ideas, some of which seem odd or dissonant to you.  Sometimes you end up annoyed, but sometimes you learn.  Plato would be happy as the virtuous cycle of dialectic and dialog continues.

In the second scenario, what you are fed is conditioned by what you will like.  You are rarely challenged and you settle into a static world view.  Plato would not be happy as he watches you fall into a vicious cycle to become a non-thinking absolutist.

The current state of AI is really bringing the Chomsky-Norvig debate back to the center.  Norvig's famous quote "every time I fire a linguist my models get better" lifts today to "every time I fire a theorist my models get better" across all kinds of domains.  Do we even need scientists any more?  

Plato - and I - would say we absolutely do.  But the reason isn't that we can make better predictions, discover patterns and relationships that we didn't know about before or even generate new theories or prove new mathematical theorems faster than AI can.  The reason is that our collaboratively developed shared understanding of the world, woven into the conceptual narrative that we call "science" is essential to who we are as a species.  The dialectical process that builds fragile consensus, sustains it for a while, then drives paradigm shift is not primarily an engineering optimization. It is a human social activity.   AI can absolutely obliterate humans when you cook it down to loss functions.  But that is completely missing the point of science.  Our innate desire is to understand.  Ability to predict and control is a side effect, not the primary goal of science.

The recent stalling of progress in AI has some people saying that what we need are "world models."  While this supports my argument and on the surface looks like "Platonism" in the caricature sense, I think we have to blow up the whole idea of static "models."  Again, science, art, and culture are all dynamic social activities.  Do we want these activities to be driven by algorithms with whatever-first-worked loss functions leading the way, or do we want things to be driven by human interactions?  Plato would see the former as like letting the Sophists lead culture and philosophy.  The problem, as Plato so clearly put it, is that when you let the sycophant-engines run unbridled, those pulling back toward real engagement end up "judged like a doctor brought before a jury of children with a cook as prosecutor" (Gorgias 492e).

The easy response to my challenge above is that we should just look at AI as another great "conversant" or "partner."  We can have it both ways - rapidly accelerated "progress" while somehow keeping the humans in charge.   At some level, I agree with that.  But a lot of discipline is going to be required.  Most importantly, we need to stay engaged and focused on significant culturally generative activities.  That means sustaining the drive to understand and to build shared understanding collaboratively.  That is what Plato tells us to do.

OK, as the old Greek was fond of ending things, let us be going. 





Friday, November 28, 2025

I didn't have time...

 "I didn't have time to write any unit tests."

"I didn't have time to workout."

"I didn't have time to visit my parents."

"I didn't have time to read any books."

When I find myself starting to say things like these, or when I hear them from others, I have started substituting "take the time" for "have the time."  So, for example,

"No, you mean, 'I didn't take the time to workout.'"  

That one change changes everything.  It adds agency and accountability.   

I take the time, every morning, to go running or walking with my dogs.  Some days that is really hard to squeeze in.  But it comes off the top.  Like sleep, taking the trash out, brushing my teeth.  Hard days are hard days.  Skipping workouts or other activities of daily life actually only makes them harder.

When I see code with no tests and I hear the excuse "I didn't have time to write tests,"  I see lack of accountability.  I would rather have well-defined contracts and failing tests with no implementation than code with no tests.  The former is progress.  The latter just adds risk and debt.  People who work for me sometimes don't get this right away.  It is a great example of how you usually have more agency than you realize.  I expect people to take the time to do things correctly and to include the time needed in their estimates of how long things are going to take.  I know that I am not alone in this.  People often create caricatures in their minds of leaders, family members and other unseen forces driving them to squeeze out all but the most urgent activities in their work or home life.  When you ask them, "Are you sure that is really what X wants?" where "X" is the leader, spouse or other stakeholder, the answer is often surprising and in fact just asking the question can lead to a realization of agency.

So just try this:  every time you find yourself saying "I didn't have time to..."  force yourself to say out loud "I didn't take the time to..."

Monday, May 26, 2025

Optimizing objective functions defined over partitions using the Genetic Algorithm

Earlier this year, I needed to find the best way to divide a set of $n$ things into $k$ subsets so that when an optimization is run in batches with the universe divided that way, the best overall result is achieved.  Mathematically, the problem is to optimize an objective function defined over partitions of a set.  A common instance of this problem is cluster analysis, where the objective function is a summary measure of within-cluster variation.  Another example is the multi-way number partitioning problem that arises in job scheduling, where the problem is to allocate jobs to processing units so that some characteristics of performance or throughput are optimized.  These problems are in general NP-hard, unless the objective function has nice properties or there is some natural way to limit the search space or to direct the search.

Brute force is never really an option even for small $n$.  The number of ways to divide a set with $n$ elements into $k$ subsets grows very, very fast.  For example, with $n =100$  and $k = 15$, there are more than $10^{100}$ ways to partition the set.   See Stirling Numbers of the Second Kind for the formula.

For the general problem, in which nothing useful is known about the objective function, dynamic programming, integer programming, greedy algorithms, local search, simulated annealing, multilevel k-way partitioning and genetic algorithms are all possible ways to attack the problem.  We focus here on the last one.

Solution using the Genetic Algorithm

One way to direct the search for good partitions is to represent partitions as chromosomes and use the Genetic Algorithm (GA) with chromosome fitness determined by the objective function.

This github repo contains a framework for optimizing objective functions defined over partitions using a Java implementation of GA.  The Apache Commons Math GA implementation that it uses is not optimized for performance or scale. Neither is the optimize-partition framework built on top of it.  This is proof of concept code to use in experiments to see if the GA can find good partitions.  The tl;dr is that it can as long as either $n$ is small or good partitions are not too sparse in the topology of pointwise convergence on $^n k$ (viewing $k$-size partitions as functions $n \to k$).  That is just a fancy way of saying that the implementation will find good solutions in reasonable time if given a random partition, one can turn it into a "good" partition by making a reasonably small number of element placement changes.  

The chromosome representation of a partition views the partition as a list of partition-piece values, one each for the elements in the universe.  Here is an example.

Suppose that $U = \{0, 1, 2, 3, 4, 5\}$ and $P = \{\{0,1\},\{2\}, \{3,4\}, \{5\}\}$  is a partition of $U.$

We can represent $P$ with the integer array $[0,0,1,2,2,3].$  The first two $0$ entries indicate that the first two elements of $U$ are assigned to the first partition piece in $P$.  The next value means that the third element of $U$ is assigned to the second partition piece.

The sets in a partition don't really have an order, so relabeling them results in a different chromosome representing the same partition.  For example, $[2,2,1,0,0,3]$  is another chromosome representation of partition represented by $[0,0,1,2,2,3].$   Every partition with $k$ pieces has $k!$ chromosome representations.  At first this looks like a suboptimal feature of the encoding, but it actually helps the search as each good partition ends up having $k!$ basins of attraction.

To apply the Genetic Algorithm, we need a way for chromosomes to "cross" with one another to create new chromosomes.  Our first attempt at crossover interleaves partition placement decisions of the parents.  So for example, 

$[0,0,1,2,2,3]$ crossed with $[2,0, 2, 3, 1, 3]$ gives $[0,0,1,3,2,3]$.  

The first chromosome determines the first value, the second one the second, the third comes from the first, and so on.  Sometimes, crossover defined this way results in "holes" (unused partition pieces).  Those have to be discarded when we are searching for fixed size partitions.

Our first idea for mutation is just random reassignment of a single value.

The algorithm starts with an initial population of chromosomes, usually randomly generated. 

Then successive populations are generated by

  1. Compute fitness for each chromosome in current generation
  2. Pass the top elitismRate chromosomes directly through to the next generation
  3. While next generation still has space, randomly select 2 parents from current generation and
    • With probability crossoverRate, replace the parents with each crossed with the other
    • With probability mutationRate, replace the crossed pair with mutation applied to each element.
Evolution continues until numGenerations generations have been created.

Experiments

Unit tests in optimize-partition apply the GA to some partition search problems with known optimal solutions.  In each case, with suitable parameters and enough generations, optimize-partition finds an optimal solution.


Spread the max value

Suppose $U = \{0, ..., 99\}$ and $g: U \to \mathbb{R}$ is defined by by $g(i)= 10$ for $i < 5$ and $g(i) = 0$ for $i \geq 5.$

Then define partition fitness by summing the max value of $g$ over each piece of the partition.  So for example,  given the $5$-piece partition is 

$P = \{\{0, 1\}, \{2,  3\}, \{4, 5\}, \{6\}, \{7, ..., 99\}\}$

$P$ has fitness $10 + 10 + 10 + 0 + 0 = 30$

Optimal $5$-piece partitions are those that take the five initial elements of $U$ (where $g$ takes the value $10$) and spread them across the pieces of the partition.  The maximum attainable fitness is $50$.

The test cases in the class TestOptimizePartition show that the GA consistently finds optimal partitions for this problem.

Cluster analysis

The test cases in the class TestClusterPartitionOptimizer test a form of GA-clustering on randomly generated test datasets.  The universe for each test consists of 50 random points in $\mathbb{R}^3$.  Each point is a random deviate from a pre-determined centroid created by adding Gaussian noise to each component.  Each centroid has 9 deviates around it.  The centroids are generated to be at least 10 units apart and the component noise has standard deviation $0.1$, so their is (almost certainly) a unique solution and the clustering problem is easy.  The objective function is the negated sum of intra-cluster pairwise Euclidean distances.  The tests show that the GA finds the unique optimal solution consistently after 100 generations.

Our implementation of this example is basically the same as that described in A Genetic Algorithm Approach to Cluster Analysis.  See that paper for more empirical performance and accuracy results.



Thursday, April 17, 2025

Leading in uncertain times

At the beginning of the pandemic, I wrote the post, "Put on your own mask first."  That was good advice then and it is good advice now.  But the foundational challenges that we are facing today require a little more than just maintaining a confident and positive attitude. 

Acknowledge uncertainty and factor it into your plans, but have a plan and stick to it

People need to know that their leaders have a plan.  They don't need to know all of the details and they don't have to understand the full context, but they do need to know that there is a plan and they need to understand the basic rationale behind it.  Nothing makes people more nervous than the feeling that their leaders don't have a coherent plan.  It is OK for the plan to change when it has to, but at any given time, there has to be a plan of record that team members can anchor themselves to.

Encourage questions and answer them honestly

One of the hardest things about leading in uncertain times is that you face this double-whammy of not having buttoned-up answers to everything but needing to get in front of the team more often and sometimes with little time to prepare.  Hiding from the team until you have a polished message or just talking at them is the absolute worst thing to do in these times.  Show up as your authentic self and keep coming back to the plan and the principles underneath it.

Show your team that they can count on you personally

In difficult times, bad leaders break.  Even good leaders make mistakes.  But their teams see them acknowledging their mistakes and doing everything possible to recover from them.  Your team needs to see you as the one who is going to lead them out of whatever kind of mess they or your business have gotten into.  They need to feel like you can solve any problem, even though in fact the way that you do that is by getting them to think about the problem in the right way.

Manage conflict proactively

Conflict is natural and can be a healthy part of team dynamics.  Like a controlled burn, however, it can have really bad effects if it is not managed.  In uncertain times, it's as though there is a ton of very dry tinder just waiting to explode on your team.  You need to be extra vigilant to control the burn in these times.

Keep up the pace

When I run with my dogs, if the pace is too slow, they get distracted and the whole thing kind of falls apart.  The same applies to teams.  Stop/start, indecision, putting things "on hold" - you can't let these things slow the pace to the point where the distractions kick in.   In uncertain times there are lots of distractions.  You need to keep the dogs barking.  Similar to managing conflict, this requires that you anticipate the stops and starts and keep things moving.

Celebrate initiative, agency and control along with success

Celebrating success is always important, but in uncertain times it really helps to link and label the evidence of initiative, agency and control that led to the success.  When people have a sense that a lot of their world is "out of control"  it really helps to show them, ideally with something that they have just completed, that in their job at least, they do have agency and control and all they have to do is take initiative.  

In uncertain times, people need more from their leaders.  You absolutely need to "put your own mask on first" and make sure you have the support of your own leaders, peers, friends and family, but you need to show up for your team - even when you don't have all of the answers and when you may be facing doubts yourself.  Just taking the hard question, celebrating a small success, or helping personally with a small problem can have a big impact.

Friday, March 21, 2025

The Great Experiment

The following is the transcript of a commencement address that I gave at the Nora School in 2015.

First, let me thank you for allowing me to share this wonderful day with you.  Let me add to the thanks that are rightfully exchanged today.  Thanks to the parents, whose great works are being celebrated today.  Thanks to the teachers, whose combination of pride and sadness of the ladder being kicked away I can personally relate to.  And thanks to those who make this place, the Nora School, a place where students can grow.  There is no more important work than what you do here and no more precious community than what all of you - students, teachers, parents, administrators, staff and friends - have built here.  Thanks so much for letting me be a part of it today.  

I want to talk to you today about another community that we all belong to.  That community was described by one of its founders as “a great experiment.”  It’s been more than 200 years, but in a lot of ways, it’s still an experiment.  I am, of course, talking about the United States of America.  Now before anyone heads for the exits, let me assure you that I am neither headed off into a xenophobic rant or any kind of political diatribe here.  I just want to think a little bit about what it means to be part of the American experiment today and what you who are inheriting its leadership can do to help it succeed.  Whether you are citizens or not, patriots or not, residents or not, our community welcomes you and we need your help.  

When I was a bit younger than you, we had - and lost - an inspirational president who challenged us with the oft-quoted words, “Ask not what your country can do for you, ask what you can do for your country.”  Out of context, these words lack power.  When you add the words that precede the famous quote, you see that he is not just talking about casual volunteering, or some kind of “discretionary” effort.  He is talking about the life’s blood of the Republic.  Just before the famous quote, he says,

“In the long history of the world, only a few generations have been granted the role of defending freedom in its hour of maximum danger. I do not shrink from this responsibility--I welcome it. I do not believe that any of us would exchange places with any other people or any other generation. The energy, the faith, the devotion which we bring to this endeavor will light our country and all who serve it--and the glow from that fire can truly light the world.” 

Kennedy was acutely aware, as all great American leaders have always been aware, that the only way that this great experiment can succeed is through the strength, ingenuity, resourcefulness, independence and passion of an informed electorate committed to actually making it work.

I am asking you today to become that - the generation that transforms American democracy.  There is a lot of work to do.  First, you need to really study important issues and insist on engaging in open, honest, probing and visionary debate about them.  What are the important issues?  I can tell you the ones that are important to me; but what really matters is what is important to you and why it’s important.  We can easily agree that the ad hominem, sound-byte nonsense that dominates political dialog today is not important.  So demand something different.  Find and support candidates who bring something different.  Talk to your friends and family and get them to demand something different.

I know not all of you are interested in politics and some of you may be so disillusioned by what you see in the politics feed that you just want to #fail it and let others worry about it.  Realistically, I can’t expect many of you to really engage directly.  I get that.  But that doesn’t mean you can’t respond to Kennedy’s challenge. Your example and influence, the choices you make, what you think about and talk about every day - all of these things contribute to defining who we are as a community and what we demand of our leaders.

The “who we are” part is what I want to challenge you to think about today.  I am not going to give you answers.  I am not going to tell you who to be.  I am just going to give you some things to think about.  And I want you to keep in mind the basic fact that like it or not, who we choose to be as individuals determines who we can be as a community.  To hack JKF, I guess my main point today is “ask not who your country is, ask who you are.”

One of my favorite passages in literature is the beginning of the Platonic dialog, the Protagoras.  The dialog starts with a young Athenian, Hippocrates, awakening Socrates before dawn with the urgent request that he come introduce him to Protagoras and get him to agree to take him on as a student.  Protagoras is a sophist - an itinerant peddler of what might be called finishing school services for aristocratic Greeks.  He promises to teach his students rhetoric and arete (variously translated virtue or excellence). Hippocrates wants desperately to get this training; but he has a hard time answering Socrates’ questions about what exactly he expects to get out of it.  At one point, when it has become clear that Hippocrates really has no idea what he is going to learn from Protagoras, Socrates gives him the following warning:

“Knowledge cannot be taken away in a parcel. When you have paid for it, you must receive it straight into your soul.  You go away having learned it and are benefited or harmed accordingly.”  He is making a very important point here.  You don’t get to take knowledge back to the store if it turns you into a person that you don’t want to be. You are what you learn.  That’s the first thing that I want you to think about.  When you decide what to study, who to listen to, who to work for, who to marry, who to pray with - all of these decisions are going to have irreversible impacts on who you are. And who we are.  I will offer you the same simple advice that Socrates gives Hippocrates - when making these decisions talk to - and listen to the opinions of - those who you know and respect.  

The second thing that I need you to think about is the influence that you have on others. It’s a little unfair, but everything you do sets an example.  In a convoluted but eminently logical statement, Immanuel Kant once said, “Act only on that maxim that you can at the same time will to be a universal law of nature.” This “categorical imperative” is the cornerstone of Kant’s moral philosophy.  Often paraphrased as “don’t make an exception of yourself,” what it actually means is more than that: make an example of yourself.  Like I said, that is not fair, especially for a young person.  But it’s the hand we’re dealt.  I remember when I was just a couple of years older than the seniors graduating today, I learned this lesson in a very painful way.  I lost a friend and the world lost an amazing human being.  I could have been a better example and I could have used my influence to prevent a tragedy.  I did not.  And I will never forgive myself.  Most examples are less dramatic than that, but the older you get, the more you will look back on your life and ask yourself, how were people better for having known me?  How were communities better for having included me?  We are all individually the products of the examples that others have set for us and we are collectively the result of those that we choose to follow.  The waves of social change are formed from little ripples when people decide what is cool, what is acceptable, what is inspiring, what is expected.  Little by little, the small things we go along with end up turning into big swells that carry us all along.  You may not think of yourself as a trendsetter, a role model or an example to emulate - but like it or not, you are all of these things, all the time. Think about the example you are setting.  Think about what you are defining as acceptable, inspiring and expected.

If the first part of your life has been about making sense of the world, it’s now time to start thinking about making sense to the world.  Individuality, creativity, spontaneity and adaptivity are all wonderful things that we welcome and need from you.  But as a French jazz player once put it to me, you can’t just play “n’importe qua” (just anything).  Somehow, just as your experience needs to hang together in a way that allows you to say “I think…” before every perception that you have about the world, so what you do and say needs to make sense to those around you, so they can say, “she thinks…” so they can play along, variously being inspired by and inspiring you.  We get the word ‘integrity’ from the same root that gives us ‘integer’ - a unified whole. Something or someone with integrity is first and foremost one thing.  A structure that has integrity holds itself together.  A person with integrity thinks, feels, speaks and acts with one voice – always the same.  Sure, your voice will grow, adapt, and evolve over time.  Just bring us along with you and we can all make better sense of the world and make better decisions.

I have asked you to think about three things today - who and what you allow to influence you, the example you set for others and how you can really be one person in the world.  I have asked you to do this because how well you do with each of these challenges will determine how well any community that includes you will do.  How well you respond will be the difference between a failed experiment and the realization of the great dream shared by every generation before you.  This great experiment really can succeed.  We really can realize Kennedy’s dream.  Like Kennedy himself and every other human who has ever walked this earth, we all have strengths and limitations, proud moments and moments of shame, kind moments and mean-spirited ones.  We’re not going to be perfect and we’re certainly not always going to agree.  We just need to share the commitment to really think about who we want to be and to try to stay true to that vision.  If we hold fast to this commitment, we will see in our Republic what Plato envisioned 2000+ years ago: “Justice writ large” growing from honest, self-critical dialog within and among individuals.  We just need to really care about who we are and gently but firmly call each other out when what we are doing just doesn’t make sense. 

What I am asking you is very hard.  Day-to-day pressures and rewards will often pull in the opposite direction - toward shortsighted, selfish, mindless and retreating actions and habits.  You will work with and for people who lack vision and integrity.  You will be part of groups that tolerate and even encourage bad behaviors.  Groups where what is accepted and expected makes no sense to you.  When you see that happening around you, you need to stand up and take risk - calling out the bad behaviors and challenging the group to define itself.  The courage that you show in these moments is every bit as important as the courage shown by the bravest soldier in the fiercest firefight.  You are both doing the same thing - defending an idea.  And we need desperately for you to do that.

Senator John McCain provided an example of this kind of courage on the floor of the Senate last year.  He called us out for doing something that did not make sense to him.  He said, “In the end, torture’s failure to serve its intended purpose isn’t the main reason to oppose its use.  I have often said, and will always maintain, that this question isn’t about our enemies; it’s about us.  It’s about who we were, who we are and who we aspire to be.  It’s about how we represent ourselves to the world."  

He goes on to say,  “When we fight to defend our security we fight also for an idea.”  The “idea” that McCain refers to is the same idea that motivated George Washington and the other founding fathers to launch the great experiment that is the United States of America.  The same idea that Abraham Lincoln hoped could “long endure” and that JFK challenged my generation to defend in its “hour of maximum danger.”  

It’s easy to look around us today and see examples of terrible leadership, failed institutions, structural inequity and dysfunctional politics.  It’s easy to give up - blaming “bad people” who have somehow co-opted our corporations and political institutions.  But those people are us.  When you look inside these institutions, you will see people just like you and me - all trying to find their way in the world, all looking to each other for inspiration and approval. These organizations really can be transformed from within. Every leader of every institution today will eventually be replaced by people in your generation.   You can, as Mahatma Gandhi so succinctly put it, “be the change you want to see in the world.”

One day, you will stand where I stand today, asking yourself what you are leaving the next generation and asking them to step up and lead. I sincerely hope that you will have the same optimism that I have now. The optimism of an excited scientist who feels like she is on the cusp of a great discovery. The optimism of a proud parent who sees a better future for his children.  Better not just economically, but socially and culturally because he envisions them as not just better off than him but better than him and part of a better community.   

You can lead us to realize Kennedy’s dream if you step up to the simple challenge that I have laid out for you today: Develop yourself. Set an example. And be who you are. Now, to end with my favorite quote from Plato, “let us be going.”  Thank you.

Monday, May 27, 2024

How to become a committer

Wearing three different hats over the years, I have received three different versions of the same question, to which I always respond, "You are asking the wrong question."

  1. What do I need to do to get an A in your class?
  2. What do I need to do to get promoted?
  3. What do I need to do to get commit?
This post is about question number 3., but the basic concept in all three is the same: you need to change "get" to "earn" or "become a good candidate for" to focus on the right question.

So what does it mean to be a good candidate for commit?

  1. Strongly net positive energy flow  Not everything you do in an OSS community adds energy.  Sometimes you will ask stupid questions, submit bad PRs, take offense, offend others, lick a cookie, or do other things that the community would be better without.  These things need to be balanced by good questions, helpful PRs, correct answers and other community-helpful things.  We all have our bad days and stupid moments, so don't obsess over always being "right," but do try to make sure that when you ask yourself "Am I really being useful in this community?" the answer is a strong "yes."
  2. Real mastery of some aspect of the project  Commit means you are trusted to merge PRs.  In some projects, commit may be limited to certain branches or docs or whatever; but the basic idea is the same in every case: you are trusted by the community as a steward for the project's assets.  To earn that trust you have to demonstrate real mastery of some part of the code, documentation or other non-code assets of the project.
  3. Understanding and following the ways of the project  OSS communities vary widely in how they work.  This kind of overlaps with 1., as if you don't understand and follow the written and unwritten rules of the project you will end up being an energy sink as people will have to correct you all the time. Of course, healthy OSS communities are always open to new ideas about how to do things, so if you don't like the way things work in a project initially, you may be able to drive change later.  But you will never be able to that or anything else useful unless you first take the time to learn how things are done and initially adjust your personal style as necessary.
For the remainder of this post, I am going to focus on practical strategies to achieve number 2. as a code contributor.  But it is really important that you also achieve 1. and 3. and most importantly you have to really want to achieve all three.

Many of the best contributors to open source projects start off as users of the software.  This is usually the best way to start.  The ideal scenario is that you are using the software as part of your day job, or some component of something you work with uses something from the project.  If that is not the case,  you should try to find a work or personal project that uses software from the project in some way. 

Start by really mastering the code in your own project that touches the OSS.  For example, suppose that you are interested in getting involved in Apache Kafka and you have a project at work that uses it. Look carefully at the code that uses the Kafka client APIs and the configuration of the Kafka system components.  These things may be hidden from you by an abstraction layer somewhere.  If so, go find that code.  Start by understanding why working code using the project works.  Make sure you understand why the specific APIs being used are the right ones to use for what the code is doing.  Or if you are starting something that uses the project, get it to work and make sure you can explain exactly why it works.  Confirm this with tests in your own project.

At first, confirm your understanding of how your app works just using the documentation, other online sources and your own testing.  Then take the leap to look at some code inside the project.  Sometimes the code that your own code interacts with directly is not very enlightening or it may be difficult to understand. That's OK.  Go find some other code that it looks likely that your code is exercising that looks more interesting or understandable. Look at its documentation, unit tests and recent commit history. 

After poking around for some time, making changes to your code and watching what happens when you play with release sources and binaries, you can take the next step, which is to build the software.  Depending on the project, this may require some patience and even some special tools or access to a special environment. OSS communities die if it is not possible for newbies to figure out how to build the software, so there has to be a way.  You need to figure it out. First look for build docs.  Most projects have them.  Try your best to get the build to work, but don't spend many, many hours stuck.  When you do get stuck, go back over everything you think you know about the build, look through project archives and docs and if you are still stuck, come up with the simplest possible question the answer to which is likely to get you unstuck.  Ask the community that question.  Often some script, doc, test or main code is either misleading or broken and that simple question can be very helpful - especially if you get a simple answer and your first contribution is to fix whatever is misleading or broken so the next newbie does not get similarly stuck.  That is being net positive.  Trying once and asking for help immediately is not.

Once you can build the software, you can make changes to it and watch what happens.  A fun game to play is to see if you can do things that won't break the build but will make an observable difference in your application.  Even adding log messages or debug print statements can help build understanding. If the project has good tests, breaking them will be easy.  Intentionally breaking tests and explaining why your change breaks them is a very good way to learn the stated and unstated invariants in the code.

The play steps above may not seem like a direct path to mastery, but if you skip them and try to go directly to attacking issues or conceiving a great contribution, you will end up stuck and frustrated.   A new codebase is like a new neighborhood.  If you just use GPS all the time to go as fast as possible to chosen destinations it will take you a long time to actually learn the place.  If you allow yourself to walk around a bit you will not need the GPS as much and you will end up always knowing not just one, but several ways to get to where you want to go.

A good place to start contributing is in tests and / or documentation.  Assuming that you have found and penetrated an area of the code to the point where you have a decent understanding of its behavior, you can ask yourself if the existing docs and unit tests fully explain and confirm the behavior.  Almost always, you will be able to find some things that you are not sure about or that seem vague or misleading in the documentation.  Write tests to first discover, then confirm the behavior.  Then ask the community if in fact this is the desired / expected behavior.  If it is, create a PR that includes the unit test and a patch to the documentation that clarifies the contract of the code.  Make sure that your PR passes all tests and works with whatever CI system the project uses.  Keep things as simple as possible and don't try to combine too many things into one PR.  Simple PRs that improve documentation and tests tend to be thankfully accepted.  Focusing on tests and docs initially also helps deepen your understanding of the code.

Another good place to start when the opportunity presents itself is on straightforward, labor-intensive tasks.  Upgrading dependencies, replacing deprecated methods, adding annotations, fixing linter errors, or carrying out other boring, but useful refactoring or code improvement tasks are all things that in some cases can be done without deep knowledge of the code, but which can be very helpful.  Make sure to pay careful attention to tests and carefully review anything that you generate with refactoring or AI tools if you take on this kind of task.  When upgrading dependencies also make sure to review release notes and test coverage for uses of the dependent code.  Break things into small PRs and make sure not to mix formatting or other kinds of changes with the specific improvements that your PRs claim to make.  Take extra time to make sure that your changes are correct, taking advantage of the opportunity to deepen your understanding of the code and tests.

Different communities use different forms of communication.  Make sure to subscribe to all relevant channels and try to follow as much as you can.  At first, a lot of the conversation, issues and PRs will be hard to follow, but over time more of it will make sense.  Start by paying special attention to your chosen area of focus.  If you see a question that you can answer or a problem that you might be able to solve, go for it.  Remember the net positive energy rule though: it's OK if you don't get everything exactly right immediately, but you need to be net positive - more useful ideas and contributions than distraction.

As you learn more about the code and community, you can start attacking bigger issues or bringing new ideas of your own.  Don't focus on impressing people or talking about what you have done.  Self-promotion does not work in OSS communities and is in general not necessary because everything happens in public, visible to the whole community.  What matters is what you contribute and how you work in the community.  If you consistently contribute high-quality PRs and participate positively in the community, one day you will be surprised to learn that you have been voted in as a committer.  







Tuesday, January 23, 2024

How to read

When I was first starting to read research papers in mathematics, I got some great advice from one of my professors.  He said, "Always have paper and pencil with you when you read a paper.  Read a little bit and then try to write the next part yourself.  Look at what is written in the paper as a sequence of hints.  That's all you are going to get.  You need to fill in the details yourself and if you can't do that, you have not understood the paper."   Over the years, I have realized that while research mathematics is kind of an extreme case, the same actually applies to any challenging text. So here is "the method":

  1. Read a sentence or paragraph or however much you need to get an idea.
  2. Write or say to yourself what you think is going to come next.
  3. Start from the beginning and read through the next chunk.  Compare your continuation to what actually came next. 
  4. Go to 2
You end up re-reading the whole piece many times this way.  For long things, use major breaks like chapters or whatever to limit the look back.

If you are trying to really learn the material, you can do the whole process repeatedly.  In that case, the checking in step 3 should start to show less and less divergence, mostly just style or sequencing.  You need to be careful though not to devolve into memorization.  You want to actually come up with the ideas that come next, not the words.

I do this kind of thing when I read hard material of any kind - not rigidly and sometimes changing chunks around.  If I go slowly enough, unless the material is really over my head or I am lacking needed background or something, I always end up feeling like I have had the ideas that the author was trying to convey.  That usually means that I can start to apply the ideas myself.