A structured approach to writing code for software and analyses

Luke Johnston

2025-10-23

Category theory

Lots of math in category theory, but we’ll only cover the basics.

Types (of an object) are math objects with specific properties

Necessary for correct mathematical composability.
You can create any type you want.

Algebraic data types: Sum types

Number of possible types/values is the sum of the types/values inside.

Colours: Red | Green | Blue
Bool: True | False

3 + 2 = 5 possible types/values

Algebraic data types: Product types

Number of possible types is the product of the types inside.

Colours: Red | Green | Blue
Bool: True | False

3 * 2 = 6 possible types

Actions are math transformations

Transform one object to another object.
Action will always produce same type.
Don’t need to know how, only the input and output types.

Simplified syntax for actions

Actual math syntax is more complex.

f(A: Integer) -> B: String

g(B: String) -> C: Boolean

Actions are composable

. = composition.

g(f(A: Integer)) -> C: Boolean

h = f . g

h(A: Integer) -> C: Boolean

“f composed into g” or “f followed by g”

Composing via piping

Can “chain” or “pipe” functions via composition.

g(f(A)) = f(A) . g()

Piping helps with readability.

Functor example: Map over a list

A: Integer
B: Boolean

map f(List(A, A)) -> List(B, B)

From product type to product type, but different types inside.

Category theory in computers: Functional programming

Programming: Art of managing the complexity of solving a problem

Complex is only a problem for our limited human minds. Solved with:

(De-)Composition
Abstraction
Predictability

Explicit types described in function input and output

f(A: Integer) -> B: String

Use cases in data analysis

Functional programming is very common in data analysis, though often not explicitly recognized.

In R, there are few, simple objects/types

Most common type of objects are: data.frame, vector, list.
Are algebraic data types (e.g. data.frame and list are product types).
Are functors (can apply function to internal types).
Are (mostly) immutable.

Functional piping is (now) common in R—makes more readable code

result1 = data |> clean() |> transform()
result2 = transform(clean(data))
identical(result1, result2)

Python doesn’t have piping.

Data cleaning example in R

removing_missing <- function(data) { ... }
standardize_column_names <- function(data) { ... }
clean_data <- function(data) {
  data |>
    remove_missing() |>
    standardize_column_names()
}
cleaned_data <- clean_data(raw_data)

`targets` R package: Manage complex analysis pipelines

Requires writing functionally.

File: _targets.R

library(targets)
# ...
list(
  tar_target(file, "data.csv", format = "file"),
  tar_target(data, get_data(file)),
  tar_target(model, fit_model(data)),
  tar_target(plot, plot_model(model, data))
)

Code from targets documentation.

Easy parallel processing with furrr R package

Simple because of functional programming design:

library(furrr)
plan(multisession)
result <- future_map(
  list_of_data,
  transform_function
)

See furrr documentation.

Use cases in software development

Strong typing helps enforce correct types and compositions

Rust

fn add(a: i32, b: i32) -> i32 {
    a + b
}

Python

def add(a: int, b: int) -> int:
    return a + b

Python types are not enforced. R doesn’t have these type annotations.

Example using sum types: Enums in Rust

enum Direction {
    North,
    South,
    East,
    West
}
Direction::North

Direction can only have one value and there are four possible values.

Sum types: Enums in Python

from enum import Enum
class Direction(Enum):
    North = "North"
    South = "South"
    East = "East"
    West = "West"

Need to add a value to the enum in Python.

Example monad: Option type in Rust

enum Option<T> {
    Some(T),
    None,
}

Example monad: Using `Option` type in Rust

fn divide(n1: i32, n2: i32) -> Option<i32> {
    if n2 == 0 {
        None
    } else {
        Some(n1 / n2)
    }
}
fn add_one(n: Option<i32>) -> i32 { ... }

Or use `Maybe` from `returns` package

from returns.maybe import Maybe, Some, Nothing
def divide(n1: int, n2: int) -> Maybe[int]:
    if n2 == 0:
        return Nothing
    return Some(n1 // n2)

returns package docs.

Easy to see errors by looking at function signature with `Result`

fn read_file(path: String) -> Result<String, String> {
    // ... Pseudo-code
    if file_contents {
        Ok(file_content)
    } else {
        Err("File not found".to_string())
    }
}

Use `Result` as “railway” intersection

Need to convert Result to String before using it.

fn process(data: String) -> String { ... }
read_file("file.txt")
    .expect("Didn't read file.")
    .process()
read_file("file.txt")?.process()

1 / 88

A structured approach to writing code for software and analyses Luke Johnston 2025-10-23

A structured approach to writing code for software and analyses
Outline
Category theory
What is category theory?
Based on abstractions and compositions
Similar to Graph Theory
A full graph is a category
Objects can be anything
Objects can contain other objects
Types (of an object) are math objects with specific properties
Types are sets of values or other types
Express types with object: type notation
Algebraic data types: Bigger container types
Algebraic data types: Sum types
Algebraic data types: Product types
Types are necessary for actions
Actions are math transformations
Simplified syntax for actions
Actions are composable
Composing via piping
Graphically demonstrated with:
Functors: A type of algebraic structure
Functor example: Map over a list
Graphically demonstrated with:
Category theory in computers: Functional programming
Programming: Art of managing the complexity of solving a problem
(De-)composition by breaking down a problem
Abstraction: Hide complexity and details
Immutability is an implicit assumption in category theory
No side effects allowed in “pure” functions
Explicit types described in function input and output
Same input always equals same output
Functions are also types of objects
Higher order functions (e.g. for functors) that take a function as an input
Benefits of functional programming
Power comes from it’s mathematical foundation
Declarative (vs imperative) programming
Low-level languages abstract away the imperative steps
Declarative allows us to focus on what we want to solve and the goal
Declarative tends to need less code, is more readable
Functional design pattern that can be used in many languages
Predictability and testability
Caching and speed
Easier parallel processing with pure functions and maps
Distributed/asynchronous computing: Futures and promises
Use cases in data analysis
Foundation of SQL is functional and declarative
Excel spreadsheet formulas are functional
Typical data analysis workflow
Keep decomposing, e.g. within cleaning data
Cleaning stage would be a “category”
In R, there are few, simple objects/types
In Python, it’s more complicated
Graphically demonstrated with:
Zoom in: Convert to missingness is a functor (map) step
Applying to other cases: Mapping a model function over a list of formula
Functional piping is (now) common in R—makes more readable code
Data cleaning example in R
targets R package: Manage complex analysis pipelines
Easy parallel processing with furrr R package
Parallel processing isn’t as easy in Python
Use cases in software development
Design stage: Focus on objects and their types
Better testing and predictability
Make objects with actions, compose together
Put objects into containers as functors, use map
Big benefit: Emphasis on objects’ types
Strong typing helps enforce correct types and compositions
Only one output type per function
Solve multiple output types with railway-oriented programming
Example using sum types: Enums in Rust
Example using sum types: Enums in Rust
Sum types: Enums in Python
Monads: A way to handle side effects
Illustration of a monad and flat map
Example monad: Option type in Rust
Example monad: Using Option type in Rust
Example: Option type in Python with Optional
Or use Maybe from returns package
Example of handling errors: Result type
Easy to see errors by looking at function signature with Result
Use Result as “railway” intersection
Explicitly handle Result with match
Can use Results from returns package in Python
Summarising
Category theory
Functional programming
Resources