A response to an undergrad - What do I study?

An undergrad that I was a T.A. for recently asked me what my research is in. I decided to write up an answer as a blog post, in a way that is accessible to the average early undergrad student in STEM.

I study logic. More specifically, I study semantics. In linear algebra, you start off with a certain formal algebraic language, the language of vector spaces - it has certain distinguished constant symbols you are permitted to use, like "0" for the zero vector, or any real number; you are permitted to use variables v, w, ... that range across the vectors; there are also special symbols that represent functions, like "+", for addition. There are also certain axioms, written in the language. One also has a list of formal logical laws that allow you to combine the axioms in various ways to derive theorems in the theory of linear algebra. Linear algebra from this point of view is the collection of all formal statements in the language of linear algebra that can be derived from the axioms under the accepted logical laws. All this is considered "syntax" - it is the purely formal, textual part of mathematics. This can all be coded into a computer system - a computer can check whether a sentence in the language of linear algebra is grammatically correct, or whether a proof is correct (i.e. the logical laws are applied correctly and in the right order.)

But this is only half of it. The subject of linear algebra only exists because we think of these axioms, constants, function symbols, and theorems as having meaning. From this point of view, a vector space is a mathematical structure which serves as a "model" for the theory - it is a set V of elements, which we call vectors, for which the various constants (0,...) in the language can be interpreted and given meaning; for which the function symbols ("+", ...) can be interpreted as actual functions, for which the axioms and theorems can be translated into statements about these structures, and where the axioms hold true, and the theorems tell us something about the structure in question. Semantics imbues syntax with content. Without actual structures to talk about and describe, syntax is meaningless. The notion of "provable/unprovable" belongs to syntax; the notion of "true/false" belongs to semantics.

The study of semantics is also of some technical use to the study of syntax. One of the worries a mathematician might experience when seeing the theory of linear algebra for the first time is: what if, from the axioms, it is possible to derive an absurdity - like "There exists some vector v which is not equal to itself", or to derive every sentence in the language whatsoever, both "P" and "Not P" for every possible sentence P. This would clearly render the whole theory garbage, so it is important to try and work out a way to assure oneself that this does not happen. A good way to do this is by working out the relationship between syntax and semantics. It is a standard segment of an introductory logic class to develop an interpretation of each formula and sentence in the language of linear algebra into each particular vector space; where it can now be assigned a value of "true" or "false" in the context of that particular vector space; it is then possible to show that if a sentence P is proveable from the vector space axioms, it will be true when you interpret it in each particular vector space. We refer to this as the soundness of the logic. This is, of course, as we would hope! But it is still necessary to show it. (Question: What do you think about the converse direction, which is called completeness? Suppose that some sentence P holds true in every vector space in existence. Must there necessarily be a proof of P from the axioms? Or is it the case that whenever P is unprovable, there necessarily exists a counterexample to P? Could it be the case that there are true statements about all vector spaces which nevertheless cannot be proven to be true?)

Once this is done, you can use provability to figure things out about truth, and vice versa. In the context of the theorem I just mentioned, that if P is proveable, it is true in every vector space, here is a proof that the sentence "There exists a vector v which is not equal to itself" is not provable from the vector space axioms.

1. If the sentence "There exists a vector v which is not equal to itself" is proveable from the axioms, it is true when interpreted in every vector space.
2. Take the specific vector space (R^3, 0,+), equipped with the usual structure. If the sentence above is proveable from the axioms, it is true when interpreted in this vector space.
3. But each vector in R^3 is equal to itself; so the sentence cannot be provable.

Of course this is just the notion of "counterexample", which is a thing that makes sense only in light of the relationship between truth and provability which we commented on above.

I could go on about many of the key results of classical logic - Godel's Incompleteness theorems for the formal theory of arithmetic, Cantor's diagonal argument about the existence of "multiple infinities" and Hilbert's hotel, the unsolvability of the halting problem, Godel's completeness theorem for the semantics of first-order logic using set theory, and so on. I am sure most of these have very nice Youtube explainers by Numberphile or 3blue1brown or someone like that; absolutely worth checking out sometime. But I should move on for the moment.

Now you might think all this is a little silly as we are talking about these very simple theories, where the relationships are perhaps so obvious that they are not worth commenting on. However, as the mathematical formal language one wants to work in gets more and more complicated, the problem of actually figuring out what the semantics are grows more and more complicated. The reason you might want a more complicated language is because it's more expressive, and easier to work in and to translate your proofs into. It could even be the case that the new language lets you say things or phrase arguments that couldn't be phrased at all in the old language. This is not really a problem for mathematicians; because they don't actually ever translate their arguments into formal systems in practice. So there is a formal system called ZFC set theory + first order logic, which is pretty simple from a technical sense, into which all of existing mathematics could theoretically be translated. However, it is so simple (i.e. technically non-expressive) that this would be excruciating in actual practice, and far more work than it was worth. Mathematicians pull this sleight of hand where they tell their students "this can be rigorously justified in the language of ZFC set theory" when this would be such an unfeasibly monumental task that their bluff can never be called. The benefit is that first-order logic has a very easy-to-understand semantics (meaning); and it is easy to study, which has given rise to a huge field of mathematical logic which studies first order logic, called model theory.

However, there is a group of workers who actually do work in complex formal languages all the time: programmers. A programming language is a formal system. And because programmers want their jobs to be easier, and their code to be better and cleaner and easier to understand and less buggy, they are constantly clamoring for better, more expressive programming languages in which it is easier to express their thoughts. (This is a total lie that i'm telling for the purpose of my story. Programmers are human beings, and human beings hate change, so you have to force new programming languages and ideas in programming language down their throats half the time. Also, thinking is hard, and if you give people languages in which it's possible to express richer and more complex ideas, they might just be like "well, having richer and more complex ideas seems like a lot of work." This isn't even me being cynical, I don't like thinking hard more than a few hours a day either!)

So programming languages are much more complicated than mathematical languages. The problem of determining the semantics of a programming language - what is "meant" by all the types and terms and polymorphic operators and so on - is much harder to specify. This is what I study - the semantics of programming languages, and how to use this as a tool to reason about programs written in the language, and prove things about programming languages.

Some developments in mathematics of this past century have helped to clarify how this should be done. A modern perspective on linear algebra is that it is equally important to study both vector spaces and linear transformations between them. A great deal of the structure of vector spaces is reflected in the study of maps which preserve that structure. Take, for example, dimension, which is intimately related to the problem of whether there exists an injective linear transformation between two spaces.

One school of thought is that linear algebra is really the study of a huge, interconnected network or directed graph, whose nodes are all of the vector spaces, and whose edges or links are the linear transformations between them. A huge hairball of a network, massively entangled. This hairball is called the category of linear spaces and linear transformations. Category theory is a field which studies branches of math through their "hairball", trying to see how much of the subject can be understood, phrased or argued from the point of view of the network. For example, whenever you have two sets, both equipped with a notion of distance between two points, like R^n does (called a metric) you can formulate the epsilon-delta definition of continuity, and thus define a continuous function between them. If we call a set equipped with a distance function a metric space, then there is a category whose nodes are all the metric spaces and whose edges are the continuous maps between them.

My research deals with trying to understand a programming language as having an associated "category", whose nodes are the types (integers, booleans, arrays, structs, ...) and whose arrows are the functions between them coded in the language. The semantics of the programming language - what a program "means" - I try to express in terms of this graph. As the programming language grows more complex, the structural requirements of the associated graph do, too. This is categorical logic.

I will attach some notes if you're curious. Be warned that category theory is infamous for how abstract it is; it makes linear algebra and calculus look downright concrete. Most students of mathematics don't study it until grad school, when they have picked up enough of the terrain of mathematics that they are able to understand the purpose of all this abstraction and have seen enough examples that they can root the abstraction in their own personal experiences.

There are a couple very closely related areas which I also find interesting. Often when dealing with a specific concrete problem or class of problems, it becomes necessary or useful to introduce a customized logic which reflects the subtleties of the issue at hand. For example, if you were investigating some dynamical system which was evolving in time, you might find it useful to invent a logical system for which time was an explicit part of the logic, and you could have "next", "now", "later", "eventually", and "always" as modifiers in your logic. Instead of just "P(x)" or "not P(x)" you could have "eventually, P(x)" or "always P(x)" or "never P(x)" Similarly you could design a logical system which allowed us to deal with incoming information which could change our perspective. In addition to firmly concluding "P(x)" or "not P(x)", you could add qualifiers to deal with the effects of new information not yet seen - "necessarily P(x)" or "possibly P(x)". This is called modal logic.

The central method of set theory revolves around such a custom logic, which is a logic dedicated to building and constructing things in stages, reasoning about them as they are built. So the logic doesn't just revolve around a set of "things", it has both a set of "things" and partial approximations to those things, or partial specifications of those things. The logic is meant to help relate what we know about the object we're building at finite initial stages to what the properties of the final logic would be. This is called "forcing logic."

There are also logics that arise in geometry. If you had some space like R^n, and studying functions on it (say, smooth continuous functions from R^n into the reals) there might be a difference between "f : R^n -> R has property P" and "f : R^n -> R has property P, locally", i.e., at every point x\in R^n there is some open ball containing x where P(f) holds if you restrict the domain of f to that open ball. For example, maybe you could take "f is bounded", i.e. |f(x) | <M for all x, vs. f is locally bounded - for every x there is some open ball U containing x, and |f(y)|< M for some M and for all y in U. Here, then, "locally" becomes a modifier of property P in the same way that "eventually P" and "possibly P" modify P, in a way that is unintelligible to classical logic. Call this "geometric logic."

Here is another example. In homework you often argue as follows. "We will show that this system of equations has a solution. Suppose, for the sake of a contradiction, that it does not.... (proof proof proof)... but this is impossible, because we can derive this absurdity, 0=1! So there must, indeed, be a solution to the system of equations." Say an engineer hired you to find a solution to this system of equations, and you delivered them this. What would they say? What is the use of them to this, which claims that there exists a solution but does not tell them what the solution is. How bizarre! The proof is devoid of numerical meaning! We have shown that classical logic fails to have the following desirable property:
1. whenever we prove that there exists some x with property P, we construct a specific x with property P.
It also fails to have this (sometimes desirable) property:
2. Whenever we prove that "A or B" is true, we have a proof of A or a proof of B.
Indeed, you used many times in your homework that "A or not A" must be true, even before you had found a proof of A or a proof of not A.

A constructive logic (or intuitionistic logic) is a logic which drops some of the permitted nonconstructive proof techniques like proof by contradiction or the law of the excluded middle in order to make the statements that it proves acquire more numerical meaning.

I am interested in the "meanings" of these logics - geometric logic, forcing logic, modal logic, constructive logic - making them easier to understand as deductive systems by giving concrete mathematical structures that allow us to "interpret" the things we are talking about in those structures. Are there settings which support a logic very similar to traditional/classical logic, except that the law of excluded middle or other nonconstructive proof techniques may fail? Categorical logic gives ways to attack each of these questions, which turn out to be all related to each other, and I find this very interesting.

Comments

Popular posts from this blog

Godement, Part IV - definition of monads, examples

Monads or triples?