Bidirectional Typechecking

May 4th, 2026

In 2013, Dunfield and Krishnaswami published “Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism”. This past week, I implemented their algorithm twice. The first implementation tried to stay faithful to their paper, while the second tries to use some standard tricks to optimize the implementation.

The code is available on GitHub. I also used my parser combinator library, which made the frontend simple to implement.

DK System Demo

This demo allows you to synthesize types from a given expression. (If synthesis fails, you can check the console to see where the derivation failed.)

The syntax accepted in the demo is fairly standard, but it's probably easier to start by loading examples. The parser handles both unicode symbols like λx:∀α.α → α.x and ascii like \x:forall T. T -> T.x.

λx.x

Canonical example of a polymorphic function

λf.f (f ())

Fully constrains the type of f

λx.λf.f (f x)

λx.λy. fst(<x, y>)

Pairs and projection

(λx.x x) : (∀α.α → α) → (∀β.β → β)

The "Ω" function

let x = 1 in x + 1

Let syntax is supported, but has no special meaning

Display and Input options

Gentzen Style Pretty Types

Background

System F

In the Simply Typed Lambda Calculus (STLC), you can write expressions like this (λx. x) : Int → Int, which says “I have a function that takes in an integer x, and returns an integer.” The meaning of “simple” types will be clearer later, but essentially the only types we have are basic types (say Int and Bool), or functions between those types, like Int → Bool or (Int → Int) → ((Int → Int) → Bool).

Something you might notice from the example above is that the type annotation is not unique. I could also have written (λx. x) : Bool → Bool or even (λx. x) : (Int → Bool) → (Int → Bool). This means that we can't figure out what the type should be, based on the definition λx. x alone.

This lack of unique types means that we can't “reuse” our functions. (λx. x) : Int → Int is a function from integers to integers. (λx. x) : Bool → Bool is a function from booleans to booleans. Even though our function definition is exactly the same, our type system restricts us from treating them as the same. In a real programming language, this is a significant problem, and various languages have tricks to overcome them, like overloading.

In the 1970s, Girard and Reynolds independently defined the System F type system and a second-order intuitionistic propositional logic respectively. (Read more about the equivalence of type theories and logics here). In this type system, we can express much more complex types, like ∀α.(α → α). Types like this are called quantified types, and ∀ is called a universal quantifier. This is kind of like a function for types. It says given any type α, we “produce” a function type α → α. In other words, it's very much like the type equivalent of λx. x. In fact, this is the type of λx. x.

But doesn't this still have the problem of non-unique types? Isn't it still the case that (λx. x) : Int → Int is a valid typing? Yes, so I was lying a bit when I said it was “the” type. However, it is something called the principal, or “most general” type. In other words, every possible valid typing of λx. x is an instance of ∀α.(α → α). For example Int → Int is what we get when we substitute α with Int, and (Int → Bool) → (Int → Bool) when α = Int → Bool, etc. So our “function but for types” analogy is more than an analogy. This is like function application of types. In fact it is referred to as type application.

This expressivity lets us ask a more precise question. Given just an expression, can we automatically infer the principal type? The answer turns out to still be no. While we now have principal types for some terms, not every term in System F has a principal type. So, if a term does have a principal type, can we at least infer those? Still, the answer is no! But this wasn't fully proven until 1999 by Wells.

Damas, Hindley, and Milner

Long before Wells' proof, the Damas-Hindley-Milner (HM) type system was developed and used in languages like ML. The HM system is a restriction of System F which allows for full type inference. Meaning a programmer write no type annotations, and yet the entire program can be completely type checked! So how did they make this practical?

First, since the programmer does not write annotations, there is much more control over the kinds of types that can be inferred. They also introduced the concept of “let polymorphism” where quantified types can only be introduced in expressions like let y = λx.x in x, where the type of y is inferred as the quantified type ∀α.(α → α). This gives us a precise way to control how and when polymorphic types can appear.

In retrospect, the existence of the HM type system is surprising. Slightly weaker type system? No principal types. Slightly stronger type system? Too complex, no decidable typeability.

HM System Demo

This demo allows you to infer types from a given expression within the Damas-Hindley-Milner system.

The syntax is basically the same, with the main difference that there are no type expressions in the syntax. Everything is inferred. Let expressions are also no longer just syntactic sugar as they are the primary mechanism for polymorphism. Note that the the terms are using de Bruijn indices, so λx.x appears as λ.0.

λx.x

Canonical example of a polymorphic function

λf.f (f ())

Fully constrains the type of f

λx.λf.f (f x)

λx.λy. fst(<x, y>)

Pairs and projection

let id = (λx.x) in let y = (id ()) in let z = (id (λu.u)) in z

Let polymorphism

Display and Input options

Gentzen Style

Dunfield and Krishnaswami

People were already fairly confident that full type inference for System F was not possible even before 1999. So the question becomes, how and where do we require annotations in a way that isn't intrusive, but still take advantage of the power of System F?

Since the late 1980s or so, several people had worked on something called bidirectional typechecking. Instead of just an inference algorithm, in which you start with a term and try to derive a type, the process is split into two “modes” called synthesis and checking. Synthesis is basically inference, when you construct an unknown type from known information. Checking is when you have a term and a type, and want to confirm that the term is compatible with that type.

HM can also be interpreted in this light. While any type inference system is going to trivially have only synthesis rules. But in practice, the traditional HM implementation is to gather constraints during inference, which are then “checked” when trying to solve those constraints through unification. A bidirectional system essentially includes this as part of the type theory directly.

One popular bidirectional type system is Dunfield and Krishnaswami's 2013 system from above. They had several innovations. The first was to add a 3rd mode. I won't go into detail, but it's a kind of inferred type application. This allows the system to avoid requiring explicit type application as a part of the syntax. Another was their algorithmic context management. In other words, how exactly to keep track of where things first appear, their scope, and when they are valid to use.

Notes and Future Work

In Jinxu Zhao's 2021 thesis, they formalize Dunfield and Krishnaswami's proofs. But they also noted that there were several mistakes that had to be repaired. I believe I've discovered another mistake in the DK paper, which is minor or egregious depending on your point of view. In Figure 12, they show an incorrect example derivation. While their conclusion is unchanged, this was quite confusing when I was implementing, as you can imagine. So here is the corrected derivation.

Γ′ = Γ[β̂₂, β̂₁, α̂ = (β̂₁ → β̂₂)]
Δ = Γ[β̂₂, β̂₁ = β̂₂, α̂ = (β̂₁ → β̂₂)]

──────────────────────────────────── InstLReach  ────────────────────────────────────────── InstRReach
Γ′, ›β̂, β̂ ⊢ β̂₁ :≤ β̂ ⊣ Γ′, ›β̂, β̂ = β̂₁             Γ′, ›β̂, β̂ = β̂₁ ⊢ β̂₁ ≤: β̂₂ ⊣ Δ, ›β̂, β̂ = β̂₁
─────────────────────────────────────────────────────────────────────────────────────────── InstRArr
                    Γ[α̂], ›β̂, β̂ ⊢ (β̂ → β̂) ≤: α̂ ⊣ Δ, ›β̂, β̂ = β̂₁
                    ────────────────────────────────────────── InstRAllL
                            Γ[α̂] ⊢ ∀β.(β → β) ≤: α̂ ⊣ Δ

Figure 12 (corrected)

Another change I'd like to make is around the presentation of the derivations. Right now, the generated derivations can get very big because it shows the entire context every time. A better way would be to only report changes in context between steps.

Logical interpretation

As mentioned earlier, System F is the type theoretic interpretation of second-order intuitionistic propositional logic. So what are HM and DK as fragments of this logic?

In System F, we allow quantified types to be substituted with any type, including other quantified types. For instance, if we substitute α = ∀β.β → (∀γ.β → γ) in ∀α.(α → α) we get (∀β.β → (∀γ.β → γ)) → (∀β.β → (∀γ.β → γ)). But now we can even substitute in types for β etc. We can even substitute in the type itself! So α = ∀α.(α → α) to get (∀α.(α → α)) → (∀α.(α → α)).

This ability to substitute any type is called impredicativity. The logical interpretation of this is that we allow quantification over arbitrary types, including the type itself. If we restrict what kinds of types can be substituted, we get a predicative system. This terminology dates all the way back to Russell's original formulation of a type theory.

Another way to restrict our system is by restricting something called the “rank”. A rank-0 type is just one in which no quantifiers appear. These are usually called monotypes. A rank-1 type is one in which all quantifiers must appear at the very beginning/outermost position. This is usually called prenex form.

In HM, we require both predicativity and prenex form (sometimes called prenex polymorphism). In first-order logic, every formula can be rewritten in prenex form. But this is not always true in an intuitionistic logic, making this a fairly strong restriction on the types which can be inferred. But it still appears to cover a good number of practical cases, since languages like OCaml are still popular.

In DK, we also require predicativity. But we need to be a bit more careful about distinguishing which part has which restriction. The pure type synthesis mode, we also only infer prenex form. But, the key difference is that we also allow arbitrary type annotations. Because we can annotate with higher-rank types, the type system as a whole can handle arbitrary types. The key restriction is in what kind of types can be substituted in for a quantified type, and what kinds of types are inferred in the absence of annotations.

Technical Details

While my first implementation tries to stay as true to the paper as possible, I did allow myself to add a few extra rules to handle type annotations on parameters, product types, pairs and projections. They are just too convenient to not include. As far as I can tell, my rules make sense, but I did not attempt to try to prove anything. But they are just an extension, which means that if your term doesn't contain any of this extra syntax, these extra rules will not make any difference to the derivation.

The "pretty types" are a composition of generalization and so-called miniscoping. Generalization is simple. Find all of the unsolved existential type variables, and add corresponding quantifiers in front. Miniscoping is also conceptually simple but trickier than expected. Now that we have a bunch of quantifiers in the front, we can push them in as far as possible using the following transformation ∀α.(A → T(α)) ⇔ A → (∀α.T(α)) if α does not appear free in A and T(α) ≠ α This is basically an "un-prenex" form. The purpose is mostly to highlight the differences from HM and show higher-rank types.