Walks in the neigborhood

Monday, May 26, 2025

Optimizing objective functions defined over partitions using the Genetic Algorithm

Earlier this year, I needed to find the best way to divide a set of $n$ things into $k$ subsets so that when an optimization is run in batches with the universe divided that way, the best overall result is achieved. Mathematically, the problem is to optimize an objective function defined over partitions of a set. A common instance of this problem is cluster analysis, where the objective function is a summary measure of within-cluster variation. Another example is the multi-way number partitioning problem that arises in job scheduling, where the problem is to allocate jobs to processing units so that some characteristics of performance or throughput are optimized. These problems are in general NP-hard, unless the objective function has nice properties or there is some natural way to limit the search space or to direct the search.

Brute force is never really an option even for small $n$. The number of ways to divide a set with $n$ elements into $k$ subsets grows very, very fast. For example, with $n =100$ and $k = 15$, there are more than $10^{100}$ ways to partition the set. See Stirling Numbers of the Second Kind for the formula.

For the general problem, in which nothing useful is known about the objective function, dynamic programming, integer programming, greedy algorithms, local search, simulated annealing, multilevel k-way partitioning and genetic algorithms are all possible ways to attack the problem. We focus here on the last one.

Solution using the Genetic Algorithm

One way to direct the search for good partitions is to represent partitions as chromosomes and use the Genetic Algorithm (GA) with chromosome fitness determined by the objective function.

This github repo contains a framework for optimizing objective functions defined over partitions using a Java implementation of GA. The Apache Commons Math GA implementation that it uses is not optimized for performance or scale. Neither is the optimize-partition framework built on top of it. This is proof of concept code to use in experiments to see if the GA can find good partitions. The tl;dr is that it can as long as either $n$ is small or good partitions are not too sparse in the topology of pointwise convergence on $^n k$ (viewing $k$-size partitions as functions $n \to k$). That is just a fancy way of saying that the implementation will find good solutions in reasonable time if given a random partition, one can turn it into a "good" partition by making a reasonably small number of element placement changes.

The chromosome representation of a partition views the partition as a list of partition-piece values, one each for the elements in the universe. Here is an example.

Suppose that $U = \{0, 1, 2, 3, 4, 5\}$ and $P = \{\{0,1\},\{2\}, \{3,4\}, \{5\}\}$ is a partition of $U.$

We can represent $P$ with the integer array $[0,0,1,2,2,3].$ The first two $0$ entries indicate that the first two elements of $U$ are assigned to the first partition piece in $P$. The next value means that the third element of $U$ is assigned to the second partition piece.

The sets in a partition don't really have an order, so relabeling them results in a different chromosome representing the same partition. For example, $[2,2,1,0,0,3]$ is another chromosome representation of partition represented by $[0,0,1,2,2,3].$ Every partition with $k$ pieces has $k!$ chromosome representations. At first this looks like a suboptimal feature of the encoding, but it actually helps the search as each good partition ends up having $k!$ basins of attraction.

To apply the Genetic Algorithm, we need a way for chromosomes to "cross" with one another to create new chromosomes. Our first attempt at crossover interleaves partition placement decisions of the parents. So for example,

$[0,0,1,2,2,3]$ crossed with $[2,0, 2, 3, 1, 3]$ gives $[0,0,1,3,2,3]$.

The first chromosome determines the first value, the second one the second, the third comes from the first, and so on. Sometimes, crossover defined this way results in "holes" (unused partition pieces). Those have to be discarded when we are searching for fixed size partitions.

Our first idea for mutation is just random reassignment of a single value.

The algorithm starts with an initial population of chromosomes, usually randomly generated.

Then successive populations are generated by

Compute fitness for each chromosome in current generation
Pass the top elitismRate chromosomes directly through to the next generation
While next generation still has space, randomly select 2 parents from current generation and

With probability crossoverRate, replace the parents with each crossed with the other
With probability mutationRate, replace the crossed pair with mutation applied to each element.

Evolution continues until numGenerations generations have been created.

Experiments

Unit tests in optimize-partition apply the GA to some partition search problems with known optimal solutions. In each case, with suitable parameters and enough generations, optimize-partition finds an optimal solution.

Spread the max value

Suppose $U = \{0, ..., 99\}$ and $g: U \to \mathbb{R}$ is defined by by $g(i)= 10$ for $i < 5$ and $g(i) = 0$ for $i \geq 5.$

Then define partition fitness by summing the max value of $g$ over each piece of the partition. So for example, given the $5$-piece partition is

$P = \{\{0, 1\}, \{2, 3\}, \{4, 5\}, \{6\}, \{7, ..., 99\}\}$

$P$ has fitness $10 + 10 + 10 + 0 + 0 = 30$

Optimal $5$-piece partitions are those that take the five initial elements of $U$ (where $g$ takes the value $10$) and spread them across the pieces of the partition. The maximum attainable fitness is $50$.

The test cases in the class TestOptimizePartition show that the GA consistently finds optimal partitions for this problem.

Cluster analysis

The test cases in the class TestClusterPartitionOptimizer test a form of GA-clustering on randomly generated test datasets. The universe for each test consists of 50 random points in $\mathbb{R}^3$. Each point is a random deviate from a pre-determined centroid created by adding Gaussian noise to each component. Each centroid has 9 deviates around it. The centroids are generated to be at least 10 units apart and the component noise has standard deviation $0.1$, so their is (almost certainly) a unique solution and the clustering problem is easy. The objective function is the negated sum of intra-cluster pairwise Euclidean distances. The tests show that the GA finds the unique optimal solution consistently after 100 generations.

Our implementation of this example is basically the same as that described in A Genetic Algorithm Approach to Cluster Analysis. See that paper for more empirical performance and accuracy results.

Thursday, April 17, 2025

Leading in uncertain times

At the beginning of the pandemic, I wrote the post, "Put on your own mask first." That was good advice then and it is good advice now. But the foundational challenges that we are facing today require a little more than just maintaining a confident and positive attitude.

Acknowledge uncertainty and factor it into your plans, but have a plan and stick to it

People need to know that their leaders have a plan. They don't need to know all of the details and they don't have to understand the full context, but they do need to know that there is a plan and they need to understand the basic rationale behind it. Nothing makes people more nervous than the feeling that their leaders don't have a coherent plan. It is OK for the plan to change when it has to, but at any given time, there has to be a plan of record that team members can anchor themselves to.

Encourage questions and answer them honestly

One of the hardest things about leading in uncertain times is that you face this double-whammy of not having buttoned-up answers to everything but needing to get in front of the team more often and sometimes with little time to prepare. Hiding from the team until you have a polished message or just talking at them is the absolute worst thing to do in these times. Show up as your authentic self and keep coming back to the plan and the principles underneath it.

Show your team that they can count on you personally

In difficult times, bad leaders break. Even good leaders make mistakes. But their teams see them acknowledging their mistakes and doing everything possible to recover from them. Your team needs to see you as the one who is going to lead them out of whatever kind of mess they or your business have gotten into. They need to feel like you can solve any problem, even though in fact the way that you do that is by getting them to think about the problem in the right way.

Manage conflict proactively

Conflict is natural and can be a healthy part of team dynamics. Like a controlled burn, however, it can have really bad effects if it is not managed. In uncertain times, it's as though there is a ton of very dry tinder just waiting to explode on your team. You need to be extra vigilant to control the burn in these times.

Keep up the pace

When I run with my dogs, if the pace is too slow, they get distracted and the whole thing kind of falls apart. The same applies to teams. Stop/start, indecision, putting things "on hold" - you can't let these things slow the pace to the point where the distractions kick in. In uncertain times there are lots of distractions. You need to keep the dogs barking. Similar to managing conflict, this requires that you anticipate the stops and starts and keep things moving.

Celebrate initiative, agency and control along with success

Celebrating success is always important, but in uncertain times it really helps to link and label the evidence of initiative, agency and control that led to the success. When people have a sense that a lot of their world is "out of control" it really helps to show them, ideally with something that they have just completed, that in their job at least, they do have agency and control and all they have to do is take initiative.

In uncertain times, people need more from their leaders. You absolutely need to "put your own mask on first" and make sure you have the support of your own leaders, peers, friends and family, but you need to show up for your team - even when you don't have all of the answers and when you may be facing doubts yourself. Just taking the hard question, celebrating a small success, or helping personally with a small problem can have a big impact.

Friday, March 21, 2025

The Great Experiment

The following is the transcript of a commencement address that I gave at the Nora School in 2015.

First, let me thank you for allowing me to share this wonderful day with you. Let me add to the thanks that are rightfully exchanged today. Thanks to the parents, whose great works are being celebrated today. Thanks to the teachers, whose combination of pride and sadness of the ladder being kicked away I can personally relate to. And thanks to those who make this place, the Nora School, a place where students can grow. There is no more important work than what you do here and no more precious community than what all of you - students, teachers, parents, administrators, staff and friends - have built here. Thanks so much for letting me be a part of it today.

I want to talk to you today about another community that we all belong to. That community was described by one of its founders as “a great experiment.” It’s been more than 200 years, but in a lot of ways, it’s still an experiment. I am, of course, talking about the United States of America. Now before anyone heads for the exits, let me assure you that I am neither headed off into a xenophobic rant or any kind of political diatribe here. I just want to think a little bit about what it means to be part of the American experiment today and what you who are inheriting its leadership can do to help it succeed. Whether you are citizens or not, patriots or not, residents or not, our community welcomes you and we need your help.

When I was a bit younger than you, we had - and lost - an inspirational president who challenged us with the oft-quoted words, “Ask not what your country can do for you, ask what you can do for your country.” Out of context, these words lack power. When you add the words that precede the famous quote, you see that he is not just talking about casual volunteering, or some kind of “discretionary” effort. He is talking about the life’s blood of the Republic. Just before the famous quote, he says,

“In the long history of the world, only a few generations have been granted the role of defending freedom in its hour of maximum danger. I do not shrink from this responsibility--I welcome it. I do not believe that any of us would exchange places with any other people or any other generation. The energy, the faith, the devotion which we bring to this endeavor will light our country and all who serve it--and the glow from that fire can truly light the world.”

Kennedy was acutely aware, as all great American leaders have always been aware, that the only way that this great experiment can succeed is through the strength, ingenuity, resourcefulness, independence and passion of an informed electorate committed to actually making it work.

I am asking you today to become that - the generation that transforms American democracy. There is a lot of work to do. First, you need to really study important issues and insist on engaging in open, honest, probing and visionary debate about them. What are the important issues? I can tell you the ones that are important to me; but what really matters is what is important to you and why it’s important. We can easily agree that the ad hominem, sound-byte nonsense that dominates political dialog today is not important. So demand something different. Find and support candidates who bring something different. Talk to your friends and family and get them to demand something different.

I know not all of you are interested in politics and some of you may be so disillusioned by what you see in the politics feed that you just want to #fail it and let others worry about it. Realistically, I can’t expect many of you to really engage directly. I get that. But that doesn’t mean you can’t respond to Kennedy’s challenge. Your example and influence, the choices you make, what you think about and talk about every day - all of these things contribute to defining who we are as a community and what we demand of our leaders.

The “who we are” part is what I want to challenge you to think about today. I am not going to give you answers. I am not going to tell you who to be. I am just going to give you some things to think about. And I want you to keep in mind the basic fact that like it or not, who we choose to be as individuals determines who we can be as a community. To hack JKF, I guess my main point today is “ask not who your country is, ask who you are.”

One of my favorite passages in literature is the beginning of the Platonic dialog, the Protagoras. The dialog starts with a young Athenian, Hippocrates, awakening Socrates before dawn with the urgent request that he come introduce him to Protagoras and get him to agree to take him on as a student. Protagoras is a sophist - an itinerant peddler of what might be called finishing school services for aristocratic Greeks. He promises to teach his students rhetoric and arete (variously translated virtue or excellence). Hippocrates wants desperately to get this training; but he has a hard time answering Socrates’ questions about what exactly he expects to get out of it. At one point, when it has become clear that Hippocrates really has no idea what he is going to learn from Protagoras, Socrates gives him the following warning:

“Knowledge cannot be taken away in a parcel. When you have paid for it, you must receive it straight into your soul. You go away having learned it and are benefited or harmed accordingly.” He is making a very important point here. You don’t get to take knowledge back to the store if it turns you into a person that you don’t want to be. You are what you learn. That’s the first thing that I want you to think about. When you decide what to study, who to listen to, who to work for, who to marry, who to pray with - all of these decisions are going to have irreversible impacts on who you are. And who we are. I will offer you the same simple advice that Socrates gives Hippocrates - when making these decisions talk to - and listen to the opinions of - those who you know and respect.

The second thing that I need you to think about is the influence that you have on others. It’s a little unfair, but everything you do sets an example. In a convoluted but eminently logical statement, Immanuel Kant once said, “Act only on that maxim that you can at the same time will to be a universal law of nature.” This “categorical imperative” is the cornerstone of Kant’s moral philosophy. Often paraphrased as “don’t make an exception of yourself,” what it actually means is more than that: make an example of yourself. Like I said, that is not fair, especially for a young person. But it’s the hand we’re dealt. I remember when I was just a couple of years older than the seniors graduating today, I learned this lesson in a very painful way. I lost a friend and the world lost an amazing human being. I could have been a better example and I could have used my influence to prevent a tragedy. I did not. And I will never forgive myself. Most examples are less dramatic than that, but the older you get, the more you will look back on your life and ask yourself, how were people better for having known me? How were communities better for having included me? We are all individually the products of the examples that others have set for us and we are collectively the result of those that we choose to follow. The waves of social change are formed from little ripples when people decide what is cool, what is acceptable, what is inspiring, what is expected. Little by little, the small things we go along with end up turning into big swells that carry us all along. You may not think of yourself as a trendsetter, a role model or an example to emulate - but like it or not, you are all of these things, all the time. Think about the example you are setting. Think about what you are defining as acceptable, inspiring and expected.

If the first part of your life has been about making sense of the world, it’s now time to start thinking about making sense to the world. Individuality, creativity, spontaneity and adaptivity are all wonderful things that we welcome and need from you. But as a French jazz player once put it to me, you can’t just play “n’importe qua” (just anything). Somehow, just as your experience needs to hang together in a way that allows you to say “I think…” before every perception that you have about the world, so what you do and say needs to make sense to those around you, so they can say, “she thinks…” so they can play along, variously being inspired by and inspiring you. We get the word ‘integrity’ from the same root that gives us ‘integer’ - a unified whole. Something or someone with integrity is first and foremost one thing. A structure that has integrity holds itself together. A person with integrity thinks, feels, speaks and acts with one voice – always the same. Sure, your voice will grow, adapt, and evolve over time. Just bring us along with you and we can all make better sense of the world and make better decisions.

I have asked you to think about three things today - who and what you allow to influence you, the example you set for others and how you can really be one person in the world. I have asked you to do this because how well you do with each of these challenges will determine how well any community that includes you will do. How well you respond will be the difference between a failed experiment and the realization of the great dream shared by every generation before you. This great experiment really can succeed. We really can realize Kennedy’s dream. Like Kennedy himself and every other human who has ever walked this earth, we all have strengths and limitations, proud moments and moments of shame, kind moments and mean-spirited ones. We’re not going to be perfect and we’re certainly not always going to agree. We just need to share the commitment to really think about who we want to be and to try to stay true to that vision. If we hold fast to this commitment, we will see in our Republic what Plato envisioned 2000+ years ago: “Justice writ large” growing from honest, self-critical dialog within and among individuals. We just need to really care about who we are and gently but firmly call each other out when what we are doing just doesn’t make sense.

What I am asking you is very hard. Day-to-day pressures and rewards will often pull in the opposite direction - toward shortsighted, selfish, mindless and retreating actions and habits. You will work with and for people who lack vision and integrity. You will be part of groups that tolerate and even encourage bad behaviors. Groups where what is accepted and expected makes no sense to you. When you see that happening around you, you need to stand up and take risk - calling out the bad behaviors and challenging the group to define itself. The courage that you show in these moments is every bit as important as the courage shown by the bravest soldier in the fiercest firefight. You are both doing the same thing - defending an idea. And we need desperately for you to do that.

Senator John McCain provided an example of this kind of courage on the floor of the Senate last year. He called us out for doing something that did not make sense to him. He said, “In the end, torture’s failure to serve its intended purpose isn’t the main reason to oppose its use. I have often said, and will always maintain, that this question isn’t about our enemies; it’s about us. It’s about who we were, who we are and who we aspire to be. It’s about how we represent ourselves to the world."

He goes on to say, “When we fight to defend our security we fight also for an idea.” The “idea” that McCain refers to is the same idea that motivated George Washington and the other founding fathers to launch the great experiment that is the United States of America. The same idea that Abraham Lincoln hoped could “long endure” and that JFK challenged my generation to defend in its “hour of maximum danger.”

It’s easy to look around us today and see examples of terrible leadership, failed institutions, structural inequity and dysfunctional politics. It’s easy to give up - blaming “bad people” who have somehow co-opted our corporations and political institutions. But those people are us. When you look inside these institutions, you will see people just like you and me - all trying to find their way in the world, all looking to each other for inspiration and approval. These organizations really can be transformed from within. Every leader of every institution today will eventually be replaced by people in your generation. You can, as Mahatma Gandhi so succinctly put it, “be the change you want to see in the world.”

One day, you will stand where I stand today, asking yourself what you are leaving the next generation and asking them to step up and lead. I sincerely hope that you will have the same optimism that I have now. The optimism of an excited scientist who feels like she is on the cusp of a great discovery. The optimism of a proud parent who sees a better future for his children. Better not just economically, but socially and culturally because he envisions them as not just better off than him but better than him and part of a better community.

You can lead us to realize Kennedy’s dream if you step up to the simple challenge that I have laid out for you today: Develop yourself. Set an example. And be who you are. Now, to end with my favorite quote from Plato, “let us be going.” Thank you.

Monday, May 27, 2024

How to become a committer

Wearing three different hats over the years, I have received three different versions of the same question, to which I always respond, "You are asking the wrong question."

What do I need to do to get an A in your class?
What do I need to do to get promoted?
What do I need to do to get commit?

This post is about question number 3., but the basic concept in all three is the same: you need to change "get" to "earn" or "become a good candidate for" to focus on the right question.

So what does it mean to be a good candidate for commit?

Strongly net positive energy flow Not everything you do in an OSS community adds energy. Sometimes you will ask stupid questions, submit bad PRs, take offense, offend others, lick a cookie, or do other things that the community would be better without. These things need to be balanced by good questions, helpful PRs, correct answers and other community-helpful things. We all have our bad days and stupid moments, so don't obsess over always being "right," but do try to make sure that when you ask yourself "Am I really being useful in this community?" the answer is a strong "yes."
Real mastery of some aspect of the project Commit means you are trusted to merge PRs. In some projects, commit may be limited to certain branches or docs or whatever; but the basic idea is the same in every case: you are trusted by the community as a steward for the project's assets. To earn that trust you have to demonstrate real mastery of some part of the code, documentation or other non-code assets of the project.
Understanding and following the ways of the project OSS communities vary widely in how they work. This kind of overlaps with 1., as if you don't understand and follow the written and unwritten rules of the project you will end up being an energy sink as people will have to correct you all the time. Of course, healthy OSS communities are always open to new ideas about how to do things, so if you don't like the way things work in a project initially, you may be able to drive change later. But you will never be able to that or anything else useful unless you first take the time to learn how things are done and initially adjust your personal style as necessary.

For the remainder of this post, I am going to focus on practical strategies to achieve number 2. as a code contributor. But it is really important that you also achieve 1. and 3. and most importantly you have to really want to achieve all three.

Many of the best contributors to open source projects start off as users of the software. This is usually the best way to start. The ideal scenario is that you are using the software as part of your day job, or some component of something you work with uses something from the project. If that is not the case, you should try to find a work or personal project that uses software from the project in some way.

Start by really mastering the code in your own project that touches the OSS. For example, suppose that you are interested in getting involved in Apache Kafka and you have a project at work that uses it. Look carefully at the code that uses the Kafka client APIs and the configuration of the Kafka system components. These things may be hidden from you by an abstraction layer somewhere. If so, go find that code. Start by understanding why working code using the project works. Make sure you understand why the specific APIs being used are the right ones to use for what the code is doing. Or if you are starting something that uses the project, get it to work and make sure you can explain exactly why it works. Confirm this with tests in your own project.

At first, confirm your understanding of how your app works just using the documentation, other online sources and your own testing. Then take the leap to look at some code inside the project. Sometimes the code that your own code interacts with directly is not very enlightening or it may be difficult to understand. That's OK. Go find some other code that it looks likely that your code is exercising that looks more interesting or understandable. Look at its documentation, unit tests and recent commit history.

After poking around for some time, making changes to your code and watching what happens when you play with release sources and binaries, you can take the next step, which is to build the software. Depending on the project, this may require some patience and even some special tools or access to a special environment. OSS communities die if it is not possible for newbies to figure out how to build the software, so there has to be a way. You need to figure it out. First look for build docs. Most projects have them. Try your best to get the build to work, but don't spend many, many hours stuck. When you do get stuck, go back over everything you think you know about the build, look through project archives and docs and if you are still stuck, come up with the simplest possible question the answer to which is likely to get you unstuck. Ask the community that question. Often some script, doc, test or main code is either misleading or broken and that simple question can be very helpful - especially if you get a simple answer and your first contribution is to fix whatever is misleading or broken so the next newbie does not get similarly stuck. That is being net positive. Trying once and asking for help immediately is not.

Once you can build the software, you can make changes to it and watch what happens. A fun game to play is to see if you can do things that won't break the build but will make an observable difference in your application. Even adding log messages or debug print statements can help build understanding. If the project has good tests, breaking them will be easy. Intentionally breaking tests and explaining why your change breaks them is a very good way to learn the stated and unstated invariants in the code.

The play steps above may not seem like a direct path to mastery, but if you skip them and try to go directly to attacking issues or conceiving a great contribution, you will end up stuck and frustrated. A new codebase is like a new neighborhood. If you just use GPS all the time to go as fast as possible to chosen destinations it will take you a long time to actually learn the place. If you allow yourself to walk around a bit you will not need the GPS as much and you will end up always knowing not just one, but several ways to get to where you want to go.

A good place to start contributing is in tests and / or documentation. Assuming that you have found and penetrated an area of the code to the point where you have a decent understanding of its behavior, you can ask yourself if the existing docs and unit tests fully explain and confirm the behavior. Almost always, you will be able to find some things that you are not sure about or that seem vague or misleading in the documentation. Write tests to first discover, then confirm the behavior. Then ask the community if in fact this is the desired / expected behavior. If it is, create a PR that includes the unit test and a patch to the documentation that clarifies the contract of the code. Make sure that your PR passes all tests and works with whatever CI system the project uses. Keep things as simple as possible and don't try to combine too many things into one PR. Simple PRs that improve documentation and tests tend to be thankfully accepted. Focusing on tests and docs initially also helps deepen your understanding of the code.

Another good place to start when the opportunity presents itself is on straightforward, labor-intensive tasks. Upgrading dependencies, replacing deprecated methods, adding annotations, fixing linter errors, or carrying out other boring, but useful refactoring or code improvement tasks are all things that in some cases can be done without deep knowledge of the code, but which can be very helpful. Make sure to pay careful attention to tests and carefully review anything that you generate with refactoring or AI tools if you take on this kind of task. When upgrading dependencies also make sure to review release notes and test coverage for uses of the dependent code. Break things into small PRs and make sure not to mix formatting or other kinds of changes with the specific improvements that your PRs claim to make. Take extra time to make sure that your changes are correct, taking advantage of the opportunity to deepen your understanding of the code and tests.

Different communities use different forms of communication. Make sure to subscribe to all relevant channels and try to follow as much as you can. At first, a lot of the conversation, issues and PRs will be hard to follow, but over time more of it will make sense. Start by paying special attention to your chosen area of focus. If you see a question that you can answer or a problem that you might be able to solve, go for it. Remember the net positive energy rule though: it's OK if you don't get everything exactly right immediately, but you need to be net positive - more useful ideas and contributions than distraction.

As you learn more about the code and community, you can start attacking bigger issues or bringing new ideas of your own. Don't focus on impressing people or talking about what you have done. Self-promotion does not work in OSS communities and is in general not necessary because everything happens in public, visible to the whole community. What matters is what you contribute and how you work in the community. If you consistently contribute high-quality PRs and participate positively in the community, one day you will be surprised to learn that you have been voted in as a committer.

Tuesday, January 23, 2024

How to read

When I was first starting to read research papers in mathematics, I got some great advice from one of my professors. He said, "Always have paper and pencil with you when you read a paper. Read a little bit and then try to write the next part yourself. Look at what is written in the paper as a sequence of hints. That's all you are going to get. You need to fill in the details yourself and if you can't do that, you have not understood the paper." Over the years, I have realized that while research mathematics is kind of an extreme case, the same actually applies to any challenging text. So here is "the method":

Read a sentence or paragraph or however much you need to get an idea.
Write or say to yourself what you think is going to come next.
Start from the beginning and read through the next chunk. Compare your continuation to what actually came next.
Go to 2

You end up re-reading the whole piece many times this way. For long things, use major breaks like chapters or whatever to limit the look back.

If you are trying to really learn the material, you can do the whole process repeatedly. In that case, the checking in step 3 should start to show less and less divergence, mostly just style or sequencing. You need to be careful though not to devolve into memorization. You want to actually come up with the ideas that come next, not the words.

I do this kind of thing when I read hard material of any kind - not rigidly and sometimes changing chunks around. If I go slowly enough, unless the material is really over my head or I am lacking needed background or something, I always end up feeling like I have had the ideas that the author was trying to convey. That usually means that I can start to apply the ideas myself.

Sunday, January 14, 2024

Why I love mathematics

Millie and Al's https://www.flickr.com/photos/27480193@N05/

I love mathematics because it never says one thing and does something else. If it ever seems to do that, it is always because I am missing some idea. I never stay mad at mathematics.

I love mathematics because it is always there, waiting for me. It will always be there even if I don't jump on it right now. It worn't run away or turn into some not fun thing. When I go back over mathematics that I haven't looked at in a while, it's like going back to the old neighborhood and having that warm and happy feeling you get when nothing has changed.

I love mathematics because it loves me. Mathematics has infinite patience for me. I can be arbitrarily stupid for arbitrarily long. Mathematics keeps the light on for me.

I love mathematics because it surprises me and makes me think differently all the time. I feel like Aeneas in the world of mathematics, constantly meeting monstra mirable dictu, but without the carnage.

Friday, September 15, 2023

How not to get breached (too badly)

The other day, someone asked me if I had ever experienced a major security breach. They were shocked when I responded that I had not. That is because I have been a CTO for the past 18 years, including stints at some bleeding edge SaaS companies that were constantly under attack. I have managed many security incidents, but none that resulted in a major breach. While of course it is possible and even likely that dumb luck has played a role in my success, I think that the following things have certainly helped. Each of the imperatives below have been important to me. I have never achieved perfection in any of them, but I have never stopped pushing. The imperatives are written from the standpoint of a CTO; but anyone who cares about security can use them to help protect their company.

1. Know your risks and be honest and transparent about them

You should always have a top n list of key risks that you are worried about, and n should be less than or equal to 10. You should have a weekly conversation with your Security officer that reviews each risk, mitigation plans, compensating controls and any help that the security team needs managing it. It is critical that these conversations be open, honest, comprehensive and engaged. If you can't understand some of the technical security content, you need to study it and keep asking questions until you understand it. You need to build a culture on your team that does not sugar coat or hide risks and your response to learning about them needs to be consistently positive, supportive and action-oriented. You need to never appear to be annoyed by risks or angry at those who report them. You also need to review critical risks at least quarterly with your partners on the executive team. These reviews need to be open, interactive, transparent and engaging. Your objective is not to show that you have things under control, to justify investment or to cover your ass. Your objective is to proactively cover their ass. If you do a good job clearly representing risks, mitigation plans and compensating controls, you will get funding and leadership support as a side effect.

2. Focus on actual risk

Don't let compliance, vendors, talking heads or your own cool ideas distract you from systematically reducing risk. If you do a good job identifying and managing risk, you will achieve compliance as a side effect. Attackers don't care how "buttoned up" your security program looks or how beautifully illustrated your risk registry is. What matters is where you are weak and what you have done to compensate for your weaknesses. Note that this is exactly the same thing that auditors care about. Try to put every possible cycle into things that significantly reduce actual risk. Compensating controls can be ugly and low tech, but they can save your ass. I am certain that some of the ugliest and most cumbersome shims that I have put in while working on permanent solutions have saved me and my companies from great harm.

3. Pull on every thread

When you have evidence of an attack or vulnerability, make sure that you investigate everything fully. Don't just celebrate having thwarted or annoyed the attackers into leaving. You need to have people whose job it is to investigate security incidents and you have to give them the time, tools and access to do their jobs. Ask probing questions and don't assume that your initial analysis of an anomaly is correct. I wrote a few years back about fully solving problems. That same thinking applies here - especially the part about "widening bugs."

4. Make security@ a welcoming and helpful place

Respond kindly and helpfully to all security-related questions or concerns from employees. Don't put the burden on the reporter to establish that an issue is worth investigating. You want them to come back, not to feel stupid or ignore things. You also want them to learn. The most important single thing that any company can do to reduce security risk is to develop or acquire high-quality security awareness training and make sure that everyone takes it. If your company produces any kind of software (including for internal use), make sure to also acquire - and require - high-quality secure application development training. It's very important that all training, consulting, reviews and standards promoted by your security team be of the highest quality and backed by friendly people who understand your business and are willing to patiently answer questions and help people understand security concepts.

5. Stay current

Attack types and vectors change all of the time. Many of these changes have no impact on your environment, but you need to make sure you see and assess impact of new threats as they emerge. Here again, you need to have people whose job includes monitoring security advisories and assessing their impact on you. You need to make sure that you understand clearly what the advisories and your team are telling you. That means you need to devote a significant amount of your professional development time to keeping up with the changing threat landscape. As a CTO, you have a unique vantage point that virtually nobody else in the company has. It is critical that as new threats emerge, you personally think through their implications across your technology estate. Just as you and your team need to stay current on the threat landscape, so your systems need to remain current in terms of patch levels. If you do not have fully automated, short-cycle patch deployment capability, this needs to be put in place. Even if it is in place, it needs to be actively maintained and continuously exercised and you need to eliminate the things that can't be patched.

6. Do real diligence on suppliers

Some companies have well-established supplier risk management capabilities. Even in those companies, however, sometimes the level of technical security diligence is not where it needs to be. You need to step back from the questionnaires and boilerplate RFP responses and think clearly about where the material risks are with suppliers and dive deeply into what controls you and they have in place to mitigate them. Again, your vantage point as CTO is critical here. You can't count on somebody else's checklist to see the full picture. You need to direct your best spidey-sense toward where vendors may be weak and adaptively probe to make sure that you have discovered all of the risks. Be especially careful with low-cost vendors. Finally, make sure that you regularly review vendors' security posture. If a vendor is acquired or the product or service you are using is slated for end of life, that should trigger a special review and contingency plan.

7. Use architecture as a weapon

Everyone knows that it's way less effective to try to add security to a naively implemented system. On the opposite side of this are systems that deliver security "natively," exposing services and features in a way that is hostile to attackers. Zero trust and end-to-end encryption are examples of this. They are baked into the architecture and inherently hostile to attackers. Really effective, dynamic and fast credential management and revocation is another great weapon. The same service that allows applications to quickly get keys and secure communications can help you quickly respond to an attack or vulnerability. Constantly push for the architecture win-wins that make your systems both more robust and more secure.

8. Constantly question privilege

Make least privilege an organizing principle. Start with your own. Do you really need production access? You are a fat target. Don't have anything valuable on your laptop and don't let your account be a valuable attack vector. The same applies to everyone in your company and every process running in your environment. The less they can do, the less valuable they are as attack vectors. Constantly ask why individuals or processes need the access that they have and constantly prune. Just like patching, if your environment does not have the automation, test environments or other infrastructure required to stop individuals or batch jobs from having to have weakly limited production access, you need to fix that. If offboarding is not bulletproof when it comes to system access, make it so. Now. This sometimes looks daunting or even impossible to fix. Trust me, it never is.

9. Don't acquire, decrypt, transmit or store data that you don't need

I often make the joke that the most secure and highly available system is the null system - the system that does nothing. To deliver business value, systems need to access and process data. But many can fully achieve their objectives with a much lighter touch on data. Here I am reminded of an army adage about conserving energy quoted by Colin Powell in his awesome leadership book: "Don’t run if you can walk; don’t stand up if you can sit down; don’t sit down if you can lie down; and don’t stay awake if you can go to sleep." Here is my version for data processing: Don't receive if you can do without; don't decrypt if you don't need to see; don't store if you don't need to persist; don't transmit if can omit. The wonderful thing about modern distributed architectures is that they don't force you to create massive centralized honey pots of raw data. So don't.

10. Don't assume that any zone, subnet, vm or other subsystem can be hardened

I saved the best one for last. It never ceases to amaze me how coming on 40 years now after the famous Gage/McNeally pronouncement that "the network is the computer," people keep thinking that they can somehow wall off little islands of safety and security. You need to disabuse yourself and your team of that archaic fantasy. You need to constantly assume that the bad guys are inside "your" network and focus on limiting what they can do once inside. Like multi-factor authentication, strong crypto keys and end-to-end encryption, network controls are good architectural weapons, but they are just one weapon. Like the others, they make your estate a more hostile place for attackers and slow them down; but they need the others behind them.

Some of these imperatives might seem to limit or constrain what you can do with your products or how fast you can move. They absolutely don't have to. For example, number 9 does not say you can't store any data. It just says you should limit what you store. And guess what, if you do the hard thinking in system design to limit things to what you actually need, you can move faster and do more. Once you get hard core about 8, you will also pick up speed and value because the only way to do it is to automate. The win-wins in 7 really are all over the place and once you get that thinking baked in, first in your own mind and then in your team and entire company, you will get benefits way beyond just being a harder target.