A structured approach to writing code for software and analyses

Luke Johnston

2025-10-23

Outline

  • Simple introduction to category theory

  • Category theory in computers: Functional programming

  • Benefits of functional programming

  • Example use cases in data analysis

  • Example use cases in software development

The first three sections are a bit more abstract and conceptual. But I’ll connect them later on when going into the practical use cases. For those who aren’t interested in doing software development, you can skip the last section. We’ll take a short break before it so you can leave.

Category theory

Lots of math in category theory, but we’ll only cover the basics.

What is category theory?

  • Branch of mathematics.
  • Deals with abstract objects and connection between.
  • Used in many fields of math and programming.

Based on abstractions and compositions

Abstraction

  • Collection of things (“objects”).
  • Actions (“morphisms”) that transform the things.

Composition

  • Actions can combine into a new action.
  • Combined action produces the same result as sequence of them.

Similar to Graph Theory

  • Graphs have nodes and edges.

  • Categories have objects and arrows (called “morphisms”).

A full graph is a category

Category = composition of all objects and arrows

A

B

C

Objects can be anything

An abstraction of anything: a number, a string, a function.

Objects can contain other objects

  • A vector: Contains sub-objects of numbers or strings.

  • A data frame: Contains sub-objects of vectors.

  • A list/collection: Contains sub-objects of any type.

Types (of an object) are math objects with specific properties

  • Necessary for correct mathematical composability.

  • You can create any type you want.

Types are sets of values or other types

A boolean type can only have two values:

  • True
  • False

An integer type can have infinite values:

  • Negative infinity to positive infinity

Express types with object: type notation

A: Integer
B: String
D: Function

Can be empty (()):

y: ()

Product/sum type:

C: List(A, B)

Algebraic data types: Bigger container types

  • Holds other types or values.

  • Two kinds: Sum types and product types.

  • Can compose together (e.g. sum in product).

Really important later on.

Algebraic data types: Sum types

Number of possible types/values is the sum of the types/values inside.

Colours: Red | Green | Blue
Bool: True | False

3 + 2 = 5 possible types/values

These become important later on.

Either a specific Colour or a specific Bool.

Algebraic data types: Product types

Number of possible types is the product of the types inside.

Colours: Red | Green | Blue
Bool: True | False

3 * 2 = 6 possible types

Can have either Red and True, Red and False, etc.

Types are necessary for actions

Actions take (input) a type and output a type.

Actions are math transformations

  • Transform one object to another object.

  • Action will always produce same type.

  • Don’t need to know how, only the input and output types.

Simplified syntax for actions

Actual math syntax is more complex.

f(A: Integer) -> B: String

g(B: String) -> C: Boolean
  1. is the input type, -> is the output type.

Actions are composable

. = composition.

g(f(A: Integer)) -> C: Boolean

h = f . g

h(A: Integer) -> C: Boolean

“f composed into g” or “f followed by g”

The syntax is a bit different from normal function notation, but it’s easier to see the composition.

Composing via piping

Can “chain” or “pipe” functions via composition.

g(f(A)) = f(A) . g()

Piping helps with readability.

Graphically demonstrated with:

f()

g()

h()

A: Integer

B: String

C: Boolean

If A goes to C via B with the use of f and g, then there must also be an arrow from A to C directly via another function h, which is the composition of f and g.

Functors: A type of algebraic structure

  • Type that allows an action to be applied to types inside.

  • But keeps the overall structure.

  • E.g. lists are usually functors.

Functor example: Map over a list

A: Integer
B: Boolean

map f(List(A, A)) -> List(B, B)

From product type to product type, but different types inside.

Graphically demonstrated with:

List 2

List 1

f()

f()

map F

A: Integer

A: Integer

B: Boolean

B: Boolean

F here is the type “function” that map uses as input.

Category theory in computers: Functional programming

Programming: Art of managing the complexity of solving a problem

Complex is only a problem for our limited human minds. Solved with:

  • (De-)Composition
  • Abstraction
  • Predictability

The core of programming is abstraction and composition.

We use composition because it helps us manage complexity, as our human minds are limited by working memory, which can only hold a few pieces of information at a time.

So we build small pieces of action that compose together to do larger, more complex action.

(De-)composition by breaking down a problem

  • Take a bigger problem
  • Decompose (break it down) into smaller problems/pieces
  • Solve each smaller piece
  • Compose (combine) those pieces together
  • Solve the bigger problem
  • Abstract away details of smaller pieces
  • Focus on general relationships between pieces

Abstraction: Hide complexity and details

Not have to imperatively state how exactly to do something, every time.

Immutability is an implicit assumption in category theory

  • Existing objects don’t change when doing an action.

  • Otherwise, the math won’t work.

No side effects allowed in “pure” functions

  • E.g. changing values of an existing type.

  • Can only output one type.

Explicit types described in function input and output

f(A: Integer) -> B: String

Same input always equals same output

Predictable and testable.

Functions are also types of objects

Can use them in other functions.

Higher order functions (e.g. for functors) that take a function as an input

Tools like map, filter, reduce.

Benefits of functional programming

Power comes from it’s mathematical foundation

Can construct mathematically provable programs.

Though not provable that the values are correct, just that the input types and output types, the actions and compositions, and the flows are correct.

Declarative (vs imperative) programming

  • Humans naturally think declaratively.

  • Computers operate imperatively.

“What do you want your program to produce?” vs “How do you want your program to produce it?”

The computers at the base need to be imperatively instructed how to do specific tasks.

Low-level languages abstract away the imperative steps

So that we can work declaratively (e.g. we don’t write in assembly or machine code).

Declarative allows us to focus on what we want to solve and the goal

  • Closer match between mental model of the design of solution to problem and the implementation.

  • Action on the object, e.g. “buy groceries” or “pick up the kids”.

Declarative tends to need less code, is more readable

Less code = easier to maintain and read.

Functional design pattern that can be used in many languages

You can take this design and framework for solving problems and use it in many languages.

Predictability and testability

Same output from same input.

Caching and speed

Functions can be memoized (cached when same input is used)

Easier parallel processing with pure functions and maps

List 2

List 1

f() in core 1

f() in core 2

map F

A

A

B

B

Distributed/asynchronous computing: Futures and promises

Abstract containers (types):

  • Future = A value that will be available later

  • Promise = How to get that value later

Use cases in data analysis

Functional programming is very common in data analysis, though often not explicitly recognized.

Foundation of SQL is functional and declarative

Design of databases and the queries are functional.

Excel spreadsheet formulas are functional

Typical data analysis workflow

Decompose into small pieces/steps:

  • Clean data
  • Apply transformations
  • Model data
  • Extract results
  • Visualize results

Keep decomposing, e.g. within cleaning data

  • Convert empty string to missing.
  • Remove missing values.

Cleaning stage would be a “category”

  • Objects (or types) are “data at each step”.
  • Arrows are functions that transform data.

In R, there are few, simple objects/types

  • Most common type of objects are: data.frame, vector, list.

  • Are algebraic data types (e.g. data.frame and list are product types).

  • Are functors (can apply function to internal types).

  • Are (mostly) immutable.

Nice thing with R is that there are only a few core object types to keep track of.

The object/type is almost always data.frame: A collection of column objects/types

The data.frame type doesn’t change, but the internal types do

In Python, it’s more complicated

  • Object types depend on the package you’re use.

  • Objects (classes) are mutable.

  • Functional programming features are mostly an afterthought.

But that doesn’t mean that you can’t design things in a functional way at a high level. Just that some features are either not present, not well implemented, or difficult to use.

Graphically demonstrated with:

empty to missing

remove missing

Data: Raw

Data: Empty as missing

Data: Cleaned

Zoom in: Convert to missingness is a functor (map) step

Data frame

Data frame

f()

f()

map F

Column 1

Column 2

Modified
column 1

Modified
column 2

Applying to other cases: Mapping a model function over a list of formula

List 2

List 1

f()

f()

map F(formula, data)

Model
formula 1

Model
formula 2

Results 1

Results 2

Functional piping is (now) common in R—makes more readable code

result1 = data |> clean() |> transform()
result2 = transform(clean(data))
identical(result1, result2)

Python doesn’t have piping.

Python doesn’t have a very usable piping or chaining feature, though it can be approximated with method chaining.

Data cleaning example in R

removing_missing <- function(data) { ... }
standardize_column_names <- function(data) { ... }
clean_data <- function(data) {
  data |>
    remove_missing() |>
    standardize_column_names()
}
cleaned_data <- clean_data(raw_data)

targets R package: Manage complex analysis pipelines

Requires writing functionally.

File: _targets.R
library(targets)
# ...
list(
  tar_target(file, "data.csv", format = "file"),
  tar_target(data, get_data(file)),
  tar_target(model, fit_model(data)),
  tar_target(plot, plot_model(model, data))
)

Code from targets documentation.

Easy parallel processing with furrr R package

Simple because of functional programming design:

library(furrr)
plan(multisession)
result <- future_map(
  list_of_data,
  transform_function
)

See furrr documentation.

Parallel processing isn’t as easy in Python

Need functional programming in Python to do (multiprocessing).

Use cases in software development

Design stage: Focus on objects and their types

(Not objects in object-oriented programming.)

  • Then build up actions around those objects.

  • Helps build the foundation of the software.

Better testing and predictability

  • No state changes, no side effects.

Make objects with actions, compose together

  • Decompose objects with clear actions on them.

Put objects into containers as functors, use map

  • Build containers for objects to act as functors.

  • Apply functions to all objects via functors.

Big benefit: Emphasis on objects’ types

Strong typing helps enforce correct types and compositions

Rust

fn add(a: i32, b: i32) -> i32 {
    a + b
}

Python

def add(a: int, b: int) -> int:
    return a + b

Python types are not enforced. R doesn’t have these type annotations.

Only one output type per function

Keeps it simple and predictable. So this isn’t functional in Python:

def keep_num_gt_zero(x: int) -> int | None:
    if x > 0:
        return x
    return None

Solve multiple output types with railway-oriented programming

f1

return

return

f2

f3

Input

internal

Type A

Type B

Sum type

Result A

Result B

Example using sum types: Enums in Rust

enum Direction {
    North,
    South,
    East,
    West
}
Direction::North

Direction can only have one value and there are four possible values.

In Rust, you create the object, which defines the type. A type doesn’t need a value. Rust more closely follows functional programming, so this is the object construction.

Example using sum types: Enums in Rust

enum GameResult {
    \\ Integer
    Score(i8),
    \\ Lost
    Lost,
}
GameResult::Score(10)
GameResult::Lost

Either score or lost, but not both.

Sum types: Enums in Python

from enum import Enum
class Direction(Enum):
    North = "North"
    South = "South"
    East = "East"
    West = "West"

Need to add a value to the enum in Python.

Weakly/dynamic typed languages aren’t very good at handling some of the more powerful features of functional programming, but you can still make a lot of use of the concepts of functional programming. R does not have any formal enum.

Monads: A way to handle side effects

  • Essentially is a sum or product type as object.

  • Must implement a flat map function (e.g. “unwraps” a container).

Illustration of a monad and flat map

Sum type

f

M f

Type A

Type B

Type A

Example monad: Option type in Rust

enum Option<T> {
    Some(T),
    None,
}

You don’t need to explicitly create enum Option.

Example monad: Using Option type in Rust

fn divide(n1: i32, n2: i32) -> Option<i32> {
    if n2 == 0 {
        None
    } else {
        Some(n1 / n2)
    }
}
fn add_one(n: Option<i32>) -> i32 { ... }

This is really nice because it forces you to handle None cases explicitly, since not all functions take Option as an input type.

Example: Option type in Python with Optional

from typing import Optional
def divide(n1: int, n2: int) -> Optional[int]:
    if n2 == 0:
        return None
    return n1 // n2

Or use Maybe from returns package

from returns.maybe import Maybe, Some, Nothing
def divide(n1: int, n2: int) -> Maybe[int]:
    if n2 == 0:
        return Nothing
    return Some(n1 // n2)

returns package docs.

Example of handling errors: Result type

At least two possible values: Success and its type or Error

enum Result<T, E> {
    Ok(T), // Successful result
    Err(E), // Why error occurred
}

Easy to see errors by looking at function signature with Result

fn read_file(path: String) -> Result<String, String> {
    // ... Pseudo-code
    if file_contents {
        Ok(file_content)
    } else {
        Err("File not found".to_string())
    }
}

Use Result as “railway” intersection

Need to convert Result to String before using it.

fn process(data: String) -> String { ... }
read_file("file.txt")
    .expect("Didn't read file.")
    .process()
read_file("file.txt")?.process()

Explicitly handle Result with match

let file = read_file("file.txt")
match file {
    Ok(content) => process(content),
    // Crash program
    Err(error) => panic!("{}", error),
}

Can use Results from returns package in Python

Summarising

Category theory

  • Objects and actions
  • Types as sets of values
  • Sum types and product types
  • Composition of objects and actions
  • Functors: Functions used in a container (e.g. map)
  • Monads: Unwrap a container (e.g. flat map)

Functional programming

  • Declarative programming
  • Design focus via composition, abstraction, flow
  • Same input = same output, no side effects
  • Functions are objects
  • Objects are immutable
  • Higher order functions (functors like map)
  • Monads for side effects (e.g. Option, Result)

Resources

  • https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/
  • https://kindafunctional.com/index.html
  • https://en.wikipedia.org/wiki/Functional_programming
  • https://en.wikipedia.org/wiki/Category_theory
  • https://en.wikipedia.org/wiki/Monad_(functional_programming)
  • https://algocademy.com/blog/understanding-monads-in-functional-programming-a-comprehensive-guide/
  • https://www.turingtaco.com/functors-the-key-to-scalable-functional-code/
1 / 88
A structured approach to writing code for software and analyses Luke Johnston 2025-10-23

  1. Slides

  2. Tools

  3. Close
  • A structured approach to writing code for software and analyses
  • Outline
  • Category theory
  • What is category theory?
  • Based on abstractions and compositions
  • Similar to Graph Theory
  • A full graph is a category
  • Objects can be anything
  • Objects can contain other objects
  • Types (of an object) are math objects with specific properties
  • Types are sets of values or other types
  • Express types with object: type notation
  • Algebraic data types: Bigger container types
  • Algebraic data types: Sum types
  • Algebraic data types: Product types
  • Types are necessary for actions
  • Actions are math transformations
  • Simplified syntax for actions
  • Actions are composable
  • Composing via piping
  • Graphically demonstrated with:
  • Functors: A type of algebraic structure
  • Functor example: Map over a list
  • Graphically demonstrated with:
  • Category theory in computers: Functional programming
  • Programming: Art of managing the complexity of solving a problem
  • (De-)composition by breaking down a problem
  • Abstraction: Hide complexity and details
  • Immutability is an implicit assumption in category theory
  • No side effects allowed in “pure” functions
  • Explicit types described in function input and output
  • Same input always equals same output
  • Functions are also types of objects
  • Higher order functions (e.g. for functors) that take a function as an input
  • Benefits of functional programming
  • Power comes from it’s mathematical foundation
  • Declarative (vs imperative) programming
  • Low-level languages abstract away the imperative steps
  • Declarative allows us to focus on what we want to solve and the goal
  • Declarative tends to need less code, is more readable
  • Functional design pattern that can be used in many languages
  • Predictability and testability
  • Caching and speed
  • Easier parallel processing with pure functions and maps
  • Distributed/asynchronous computing: Futures and promises
  • Use cases in data analysis
  • Foundation of SQL is functional and declarative
  • Excel spreadsheet formulas are functional
  • Typical data analysis workflow
  • Keep decomposing, e.g. within cleaning data
  • Cleaning stage would be a “category”
  • In R, there are few, simple objects/types
  • In Python, it’s more complicated
  • Graphically demonstrated with:
  • Zoom in: Convert to missingness is a functor (map) step
  • Applying to other cases: Mapping a model function over a list of formula
  • Functional piping is (now) common in R—makes more readable code
  • Data cleaning example in R
  • targets R package: Manage complex analysis pipelines
  • Easy parallel processing with furrr R package
  • Parallel processing isn’t as easy in Python
  • Use cases in software development
  • Design stage: Focus on objects and their types
  • Better testing and predictability
  • Make objects with actions, compose together
  • Put objects into containers as functors, use map
  • Big benefit: Emphasis on objects’ types
  • Strong typing helps enforce correct types and compositions
  • Only one output type per function
  • Solve multiple output types with railway-oriented programming
  • Example using sum types: Enums in Rust
  • Example using sum types: Enums in Rust
  • Sum types: Enums in Python
  • Monads: A way to handle side effects
  • Illustration of a monad and flat map
  • Example monad: Option type in Rust
  • Example monad: Using Option type in Rust
  • Example: Option type in Python with Optional
  • Or use Maybe from returns package
  • Example of handling errors: Result type
  • Easy to see errors by looking at function signature with Result
  • Use Result as “railway” intersection
  • Explicitly handle Result with match
  • Can use Results from returns package in Python
  • Summarising
  • Category theory
  • Functional programming
  • Resources
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help