1. Introduction and Motivation
This section introduces the motivation for this paper. It also includes the background that informed our approach to this work.
Neural Tax Networks [
1,
2] is building a proof-of-technology ‘CTP — Certifiable Tax Prover’ for a multimodal AI system. The first modality is based on an automated theorem-proving system for first-order logic. This is realized by using the logic programming language ErgoAI [
3] for proving statements in tax law. The second modality is expected to use LLMs (Large Language Models). The primary application of LLMs is to help users input information into CTP and then help users understand the system’s output or answer. This application of LLMs combined with a first-order logical theorem-proving systems is the focus of this paper. This applications of LLMs is challenging and it has not yet been fully tried and tested.
This paper leverages logical model theory to better understand the limits of applying these two modalities together. Particularly, this paper gives insight into the limits on combining LLMs and theorem-proving systems into a single system capable of interpreting tax situations given U.S. tax law. This supposes that law, written in natural language, can be accurately represented by first-order logic or even subsets of first-order logic used by theorem-proving systems such as ErgoAI. We make a key assumption: the LLMs work with an uncountable number of meanings over all time. We argue this is not an unreasonable assumption since deeper specialization in new areas of the law or new areas of human understanding often drives the creation of ideas or new meanings. New words are sometimes created for these new meanings, additional meanings may be associated with existing words, or meanings may exist that are not associated with words. Moreover, (1) LLMs are based on very large amounts of data that is occasionally updated, (2) linguistic models such as Heaps’ law [
4] allows partial estimates of the number of unique words in a document based on the document’s size, which does not have an explicit bound, (3) new meanings may exist that are not associated with words for instance from diagonalization, and (4) natural language usage by speakers adds new words, or creates additional meanings of words over time. At the same time, we argue that the meanings of words must be maintained from when laws are enacted. The common meanings may fall out of use, but they must still be rectifiable.
Resolution theorem-proving systems have domains that contain at most a countable number of atoms. An atom is an identifier that represents a constant, a string, name or value. Particularly, these words and word meanings are generally established when the rules and facts are set. In addition, for applications of theorem-proving systems to legal reasoning, the rules are often set with specific word meanings in mind.
A special case of the upward Löwenheim–Skolem Theorem indicates if we have an arbitrarily large number of word or token meanings for our system, then we have an uncountable set of all word or token meanings. This may apply to LLMs but in some cases it may also apply to theorem-proving systems. Alternatively, we can take a limit over all time providing an uncountable number of word or token meanings over an infinite number of years for LLMs.
A key question we work on is: Suppose a theorem-proving system (such as a system designed to analyze U.S. tax law) does not add any new rules while new word or token meanings are added to the system’s domain by an LLM. This LLM essentially “feeds” the words or phrases represented by LLM tokens into the theorem-proving system. For example, suppose a theorem-proving system based on U.S. tax law is required to answer a question input by the user using a word tokenized by the LLM as having a certain meaning. A user of our system may use contemporary jargon, where laws may remain expressed in the terminology from when the laws were enacted. Similarly, suppose a new word or a new meaning of an existing word is not in the text of the tax law that otherwise would apply to the user’s question. In this case, the theorem-proving system’s rules would work the same for the original domain. That is, the domain of words contained in the tax law when it was enacted. While allowing new meanings for words, word fragments, or punctuation that are tokenized by an LLM, to work with the same set of rules in the theorem-proving system. We are not giving an effective method for handling new meanings, rather we are saying one exists. Furthermore, our analysis assumes an uncountable domain of word, atom, or token meanings, a subset of which may be supplied to the theorem-proving system by the LLMs. We then apply Löwenheim–Skolem Theorems and logical model theory.
A novel application of the Löwenheim–Skolem Theorems and elementary logical model equivalence, see Theorems 7 and 8, indicates that the same rules of the theorem-proving system can still work even when an uncountable number of word, atom, or token meanings are in the theorem-proving system’s domain. A similar idea is from the Lefschetz Principle of first-order logic [
5]. This indicates the same first-order logic sentences can be true given specific logical models with different cardinalities. For instance, one logical model may have a countably infinite cardinality and another may have an uncountable cardinality. Yet, the same set of first-order logic sentences can hold true using these very different logical models.
1.1. Technical Background
LLMs are based on [
6]. Discussion of LLMs can be found in individual review articles, such as [
7,
8]. The current paper builds on theoretical foundations, so it covers general capabilities of LLMs rather than discussing any specific LLM.
This paper seeks insight on multimodal AI systems based on the foundations of computing and mathematics. To do so we analyze the effects of the combination of a first-order logic theorem-proving system with LLMs. Combining theorem-proving, which is a classical AI methodology, with newer AI/ML methods for LLMs highlights the multimodality of our system. The multimodality of our system is neurosymbolic since it combines deep learning from LLMs and symbolic theorem-proving. Such neurosymbolic combinations have been fruitful [
9].
LLMs have been augmented with logical reasoning [
10]. Our work uses logical reasoning systems augmented by LLMs. There has also been work on the formalization of AI based on the foundations of computing. For example, standard notions of learnability can neither be proven nor disproven [
11]. This result leverages foundations of machine learning and a classical result of the independence of the continuum hypothesis. The Turing test can be viewed as version of an interactive proof system. This view maps the Turing test to key foundations of computational complexity [
12].
1.2. Technical Approach
The domain of an LLM is the specific area of knowledge or expertise that the LLM is focused on. It’s essentially the context in which the LLM is trained and applied. Thus it determines the types of tasks it can perform and the kind of information it can process and generate.
Applying LLMs are in the next phase of our proof-of-technology. We already have basic theorem-proving working in ErgoAI with several tax rules. Given appropriate facts with these rules the current system derives the expected results. It is the practical combination of these modalities that will allow complex questions to be precisely answered. Ideally, given the capabilities of LLMs, the users will be able to easily work with the system.
Many theorem-proving systems supply answers with explanations. In our case, these explanations are proofs from the logic-programming system ErgoAI.
ErgoAI is an advanced multi-paradigm logic-programming system by Coherent Knowledge LLC [
3,
13]. It is Prolog-based and includes non-monotonic logics, among other advanced logics and features. For instance, ErgoAI supports defeasible reasoning. Defeasible reasoning allows a logical conclusion to be defeated by new information. This is sometimes how law is interpreted: If a new law is passed, then any older conclusions that the new law contradicts are dis-allowed.
1.3. Structure of This Paper
Section 2 discusses select previous work on multimodality and the law. It briefly reviews expert systems and approaches where LLMs can perform some level of legal reasoning.
Section 3 gives a logic background illustrated by select ErgoAI statements. This section shows ErgoAI (Prolog) have at most a countably infinite number of atoms that may be substituted for variables.
Section 4 introduces logical model theory to give a high-level view of how our system works.
Section 5 discusses the upward and downward Löwenheim-Skolem theorems as well as elementary equivalence from logical model theory. We give results based on these foundations to show (1) LLMs and theorem-proving systems have uncountable models under certain circumstances, and (2) Models of both LLMs and the first-order logic theorem-proving systems cannot be distinguished by first-order logic expressions.
Table 1 shows key symbols and definitions.
3. First-Order Logic Systems
This section reviews first-order logic. This section culminates with a discussion of aspects of legal reasoning in ErgoAI.
First-order logic expressions have logical binary connectives and a unary Boolean operator ¬. They also have variables and quantifiers . First-order logic quantifiers only apply to variables. Functions are relations that take one or more arguments from their domains and each function returns a value from its range. Functions ranges are often subsets of their domains. Predicates are relations in first-order logic take one or more arguments from the domain. All predicates have the range . It is the variables, quantifers, functions, and predicates that differentiates first-order logic from more basic logics. We assume all logic expressions are well-formed, finite, consistent, and in first-order logic.
The focus here is first-order logic proofs that are syntactic structures. For instance, some such syntatic proofs are Hilbert-style proofs [
43]. First-order logic proofs can be syntatically be laid out in linear proof steps or as trees. The linear proof steps or trees have nodes. The nodes are connected with edges for easy understanding. Such proofs are easily presented on a computer screen. If the proof’s goal is to establish an expression is provable, then this proof has this expression as its start. The nodes of a syntatic proof are facts, axioms, and expressions. The edges connect nodes by an application of an inference rule. If a logical formula is provable, then finding a proof often reduces to trial and error or exhaustive search.
The inference rule for Prolog-like theorem-proving systems for a subset of first-order logic is generally SLD-resolution [
38]. To apply such inference rules may require substitution and unification. This is a mix of syntatic and semantic proofs, though ErgoAI can output a syntatic proof of provable first-order logic expressions.
Modern foundations of some tax laws are not all that different from ancient principles of tax law. Certain current U.S. tax laws are sometimes analogous to those from ancient governments [
35,
36,
37]. While the meaning of words evolves, ideally the semantics of particular tax laws remain the same over time. Modern tax law may be more complex than ancient tax law, for instance because modern U.S. tax law requires millions of words to specify [
39]. Nonetheless, modern tax law ideas often subsume many of the ancient principles of tax law.
Under the legal theory of Originalism, the words in the law should be given the meaning those words had at the time the law was enacted. Scalia and Garner’s book is on how to interpret written laws [
40, p 78]. For example, they say,
“Words change meaning over time, and often in unpredictable ways.”
Assumption 1 (Originalism). The principle of originalism states that a law should be interpreted based on the meaning of its words when the law was enacted.
Of course, law can be changed quickly by a legislature. Assuming originalism, the meaning of the words in the newly changed laws, when the new laws are enacted, is the basis of understanding these new laws.
Specifying meanings or words for theorem-proving systems uses atoms. Atoms are labels for constants. Atoms are made from alphanumeric characters provided the first character is a lower-case alphabetic character. Atoms also serve as labels for predicates and functions, but the focus here is on atoms representing meanings by acting as labels for constants.
Program-terms are atoms, constant, numbers, and variables. In most programming languages atoms and program-terms are specified by regular expressions. The language of a regular expression is all strings that can be generated by it. A context-free language is all strings that can be generated by a particular context-free grammar. Regular languages are context-free languages.
Program-terms can be grouped together using context-free grammars. This holds true for the syntax of atoms in ErgoAI or Prolog. A context-free grammar can represent the syntax of parenthesized expressions in ErgoAI.
In tax law, some of the inputs may be parenthesized expressions. For example, some amounts owed to the government are compounded quarterly based on the “applicable federal rate” published by the U.S. Treasury Department. Such an expression may be best written as a parenthesized expression. So, the text of all well-formed input expressions can be defined as a context free grammar. Of course, logical expressions or computer programs can only approximate the meanings of laws or tax rules. There may be a great deal of context beyond originalism that supports the meaning in laws.
An example of “context beyond originalism the supports the meaning in the natural language of law” is, for U.S. federal statutes, the reports of the House and Senate committees that drafted, in some cased modified, and ultimately approved the bills sent to the floor of the House and Senate to be voted on by the members of those chamber.
Definition 2 (Context-free grammar (CFG)). A context-free grammar where N is a set of non-terminal variables, Σ is a set of terminals or fixed symbols, P is a set of production rules so is such that where and , and is the start symbol.
The expression denotes zero or more elements from the set . Furthermore, is the empty string.
The language generated by a context-free grammar is all strings of terminals that can be generated by the production rules of a CFG. The number of strings in a language generated by a CFG is, at most, countably infinite.
The natural numbers are and integers are .
Definition 3 (Countable and uncountable numbers). If all elements of any set T can be counted by some or all of the natural numbers , then T has countable cardinality . Equality holds when there is a bijection between T and a non-finite subset of . In this case, T is countably infinite. The real numbers have cardinality which isuncountable.
The term infinite means either countable infinite or uncountable.
Cantor showed
while founding modern set theory. The assertion that there is no cardinality between
and
is the continuum hypothesis. Gödel and Cohen showed the independence of the continuum hypothesis from a standard set of axioms for set theory [
41,
42,
44].
It is well-known that the sets and have the same number of elements. This is because there is a bijection between and . This bijection maps to , each positive number in uniquely mapped to an even number in and each negative number in iks uniquely mapped to an odd number in . Clearly every element in is uniquely mapped to an element in . The converse also holds. Thus, .
The next theorem is classical. In our case, this result is useful for applications of the Löwenheim–Skolem Theorems. See Theorem 8. . The proof of the next theorem is based on showing a correspondence between and a certain CFG.
Theorem 1 (Domains from context free grammars). A context free grammar can generate an language with countably infinite strings.
Proof. Consider a context-free grammar made of a set of non-terminals , a set of terminals , a set of productions P, and the start symbol S. Let , be the empty string.
The productions of
P are:
So, the CFG G can generate a string corresponding to any integer . All integers form a countably infinite set, completing the proof. □
The languages that can be defined by regular expressions are a proper subset of the languages that can be defined by context-free grammars. So the cardinality of the set of all atoms or expressions of atoms in ErgoAI and Prolog is at most .
All legal expressions from the set Law are countable. This is because each individual word in the domain of all words used in the set Law can be uniquely numbered by a value in . Indeed, each word can be represented by the numerical value of their UTF-8 representation. Separately, any numerical values required for computing taxes can also be represented using integers and parenthesized expressions.
3.1. Basic Logic
This subsection starts by discussing propositional logic. The discussion then goes into the foundations of mathematical logic.
Propositional, or 0-order logic, only has propositions, or Boolean variables with no arguments. Propositional logic has no functions or variables, but propositional logic variables can be joined with other propositional variables using the AND (∧) as well as the OR (∨) binary operators. A proposition variable X can also be negated .
A tautology is a formula that is always true. For example, consider a Boolean propositional variable
X representing when an purchase is for business. So
X has range
. Then
must always be true so
f is a tautology. The formula
is a contradiction. Contradictions are always false. All propositional logic expressions are either true or false. Proving a propositional logic expression to be true or false can be done mechanically by trying all possible true and false values for each Boolean variable.
Formulas in first-order logic may be provable while not being tautologies. This is because first-order logic formulas can have variables that take different values. Substituting some values for the variables in a first-order formula may make it true, while substituting other values may not make the formula true. This contributes to the complexity of provability of first-order formulas.
The statement indicates that the first-order logic formula f is syntatically provable in the logical system at hand. In our case, using a Hilbert-style proof. The set is one or more first-order logic formulas. The general expression means the sentence g is syntatically provable using set , the domain of this set of formulas, and one or more inference rules.
Suppose g is a first-order logic formula. A formula may have free variables. A free variable is not bound or restricted. If a variable is quantified it is not a free variable. A quantifer Q is such that where g is a first-order formula. If , then this means . Or, if , then .
First-order logic theorem-proving programming languages such as ErgoAI or Prolog default to for any free variable x.
Definition 4 (Logical symbols of a first-order language [
41,
42,
43])
. The logical symbols of a first-order language are,
-
1.
variables
-
2.
binary logic operators and a unary logic operator ¬
-
3.
quantifiers
-
4.
scoping using () or [] or a such-that symbol :
-
5.
a concatenating symbol ,
-
6.
equality =
In a logic language, functions are relations giving values for asserting facts. So a function’s range may be a subset of its domain. Predicates are relations whose range is . Predicates make statements about facts.
Definition 5 (-language). If , then this is a first-order -language where L is the set of logical operators and quantifers, D is the domain, and σ is the signature of the system which is its constants, predicates, and functions.
The signature
contains
’s constants, functions, and predicates. The signature does not include logical operators and quantifers [
43].
Definition 6 (Logic term). Consider a first-order language , then a logic term is a well-formed expression made from all constants, variables, and functions using a finite number of applications of the recursive definition: If are terms, then is also a term where is a -ary function.
A term does not have logical connectives, quantifiers, and predicates.
A formula f is an expression of an -language. In this case, we write . A set of one or more formulas of an -language is written as , if the index set is understood. A formula f has no free variables if each variable in f is quantified by either ∀ or ∃.
Definition 7 (Logic formula). Consider a first-order language , then a logical formula is,
-
F1
where is a predicate and are terms
-
F2
where and are terms
-
F3
where is a formula
-
F4
and , where are a formulas
-
F5
and where x is a variable and is a formula.
An atomic formula is any formula defined by F1 and F2.
Assigning a formula symbol f to logic expression is done using . For example indicates the formula is the logic expression .
A formula may be neither true nor false, if the formula has free variables.
Definition 8 (Sentence). Consider a first-order logic language and a formula . If f has no free variables, then f is a sentence.
Atomic formulas have at most one predicate. If an atomic formula is made of only sentences, then it is a fundamental building block for logic or programming statements.
The next result is classical, see [
42,
43].
Theorem 2 (First-order logic is semi-decidable). Suppose is a first-order logic language, and let , then
-
1.
If is true, then there is an algorithm that can verify ’s syntatic truth in a finite number of steps.
-
2.
If is false, then in the worst case there is no algorithm can verify ’s syntatic falsity in a finite number of steps.
An interpretation defines a domain. It also defines semantics for constants, functions, and predicates. Given a first-order langauge and , then an interpretation that makes the sentence f true is a model.
Definition 9 (First-order logic interpretation [
41,
43, p. 139])
. Consider a first-order logic language and a set I. The set I is an interpretation
of iff the following holds:
-
1.
The interpretation I has a domain
-
2.
If there is a constant , then it maps uniquely to an element
-
3.
If there is a function where f takes n arguments, then there is a unique where F is an n-ary function
-
4.
If there is a predicate where r takes n arguments, then there is a unique n-ary predicate .
An interpretation also defines the semantics of all elements of .
An interpretation does not have variables or quantifiers. This is because interpretations give fixed meaning or semantics to logical formulas.
The semantics of an interpretation I of a first-order logic language is defined with natural language, mathematics, or mathematical examples. Consider ’s domain D, signature , and an interpretation I. The interpretation I has domain . So I can be applied to a sentence f by substituting values from into , where F corresponds to . If a variable x occurs more than once in F, then the substitution must consistently replace each occurrence of x with the same value.
Suppose, corresponds to for the interpretation I. The expression , means values from I are substituted into giving .
Definition 10 (Logical model). Consider a first-order language and an interpretation I of , then is a model iff all are so that each is true, where each corresponds uniquely with some .
Models are denoted by and . Since a model is an interpretation, let M be its domain . Similarly ’s domain is N. Therefore, means and the semantics of M coincides with the equivalent semantics in N. Even further, we assume means both and share from . Finally, for any model the cardinality of its domain M is denoted as . So, .
The expression , for , indicates all interpretations make the sentence f true. Model-theoretic proofs use logical models to prove statements. If is a set of one or more sentences of a first-order -language and g is a single sentence of , then holds exactly when all models of are models of g. For example, suppose an interpretation I contains the natural numbers and no other integers, in its domain. Therefore, given the sentence , then the expression is true. However, if the domain of I is updated to contain the non-negative integers , then . Going further, if , then but .
Definition 11 (First-order logic semantics). Given a set of first-order logic formulas and an interpretation I where for each then f corresponds to , then is:
Valid if every is true, for all
Inconsistent or Unsatisfiable if
Consistent or Satisfiable if is true under at least one interpretation I
If a set of formulas is inconsistent, then anything can be proved. In syntatic proof terms,
is inconsistent iff
and
for a sentence
g. If a set of formulas is inconsistent, then anything can be proven [
41]. For example, these are inconsistent formulas,
where
is a predicate.
If a set of formulas is valid for an interpretation I, then this is a semantic or model-theoretic proof of restricted to I.
Suppose is a set of first-order logic formulas. If is syntatically provable, then it is valid and hence semantically provable since it satisfies all interpretations. Say , then write when all models of are models of g.
Theorem 3 (Logical soundness). Consider a first-order logic language . For any set of one or more first-order logic formulas and a first-order sentence : If , then .
Gödel’s completeness theorem [
43, p. 202] indicates if a set of first-order logic formulas is valid hence semantically provable, then it is syntatically provable. If
g is true for all models of
, then
g is semantically provable.
Theorem 4 (Logical completeness). Consider a first-order logic language . For any set of one or more first-order logic formulas and a first-order logic sentence : If , then .
3.2. The models and
This subsection discusses models used in the remainder of the paper. These models are for first-order theorem-proving systems and LLMs.
The logic model is based on tax law as expressed in first-order logic programs. The model encodes tax law assuming originalism. That is, fixes all word and phrase meanings from when the laws were enacted, see Assumption 1. For example, the constant minimalTaxableIncome can be associated with $29,200 in 2024, but $32,200 in 2026. So, if minimalTaxableIncome in is the more recent threshold of $32,200 in 2026 but minimalTaxableIncome in is the previous threshold of $29,200 in 2024. Of course, minimalTaxableIncome can be a function that differentiates its return value by year.
All first-order logic formulas using are sentences. All sentences are assumed to be consistent. Recall languages, such as ErgoAI or Prolog, assume universal quantification for any free variables.
The logic model uses LLMs on user input for its domain. The model should sufficiently correspond with the model for the first-order logic programs used for tax law defined using . Particularly, and are applied to sentences are assumed to be consistent.
Theorems 3 and 4 tell us that any set of first-order sentences that has a valid model can be syntatically proved. And symmetrically, any set of first-order sentences that can be syntatically proved has a valid model so it can be semantically proved.
Theorem 2 indicates, if a set of first-order sentences
is valid, then there is an algorithm that can find a syntactic proof of
. However, if a set of first-order sentences
is not valid, then in the worst case, there is no algorithm that can show
is not valid. So, given a set of true first-order sentences
, we can use resolution theorem-proving algorithms to determine their truth [
41]. Resolution theorem-proving can always prove valid statements in first-order logic. These valid statements must be expressed in clausal form. This can be done by direct translation or by building sufficiently equivalent clauses. This extends to the subset of first-order logic found in ErgoAI or Prolog. This is because SLD-resolution operates on a subset of first-order logic. This subset is first-order logic Horn-clauses. We are assuming that first-order logic Horn-clauses are sufficient to properly express any statement from the set
Law.
The focus here is on ErgoAI as a syntatic theorem prover for first-order logic of Horn-clauses. We use the theorem-prover in ErgoAI for Horn-clause sentences. The focus is on ErgoAI’s theorem-prover for first-order logic Horn-clauses. Any first-order statement that is provable in ErgoAI has a valid model, see Theorem 3. Nonetheless, the Horn-clauses of first-order logic in ErgoAI or Prolog are still semi-decidable, so Theorem 2 applies. Particularly, all true Horn-clause sentences are provable since Horn-clauses are a subset of first-order logic. If the Horn-clause sentences are false, then verifying their falsity, in the worst case cannot be done. This is because functions are central to the unverifiability of false first-order sentences. Indeed, functions in Horn-clauses are sufficiently powerful to maintain unverifiability of false first-order sentences.
Unification and negation-as-failure add complexity to first-order logic programming interpretations for Horn-clauses [
49,
58]. Furthermore, facts can change. Handling a logic program’s database change in first-order logic programming can be done using additional semantics [
45].
3.3. Examples in ErgoAI
This subsection gives expamples using ErgoAI.
ErgoAI has frame-based syntax which adds structure to traditional Prolog statements and the frame-based syntax also includes object oriented features [
3]. The ErgoAI or Prolog expression
E :- B is a rule. This rule indicates that if the body
B is true, then conclude the head
E is true.
Listing 1 is an ErgoAI rule for determining if an expenditure is a deduction. The notation ?X is that of a variable. The expression ?X:Class indicates the variable ?X is an instance of Class. In this listing, the variable ?X also has Boolean properties ordinary, necessary, and forBusiness.
| Listing 1: A rule in ErgoAI in frame-based syntax |
?X:Deduction :- ?X:Expenditure, ?X[ordinary => boolean], ?X[necessary => boolean], ?X[forBusiness => boolean]. |
The rule in Listing 1 has a body indicating that if there is an ?X that is an expenditure with true properties ordinary, necessary, and forBusiness, then ?X is a deduction. This rule is taken as an axiom.
Listing 1 can be expressed in terms of provability. This is because ErgoAI facts are axioms. ErgoAI rules may be axioms or logical inference rules.
The ErgoAI code in Listing 2 has three forBusiness expenses. It also has two donations that are not forBusiness. Since these two donations are not explicitly forBusiness, by negation-as-failure ErgoAI and Prolog systems assume they are not forBusiness. The facts are the first five lines. There is a rule on the last line. Together the facts and rules form the database of axioms for an ErgoAI program.
| Listing 2: Axioms in ErgoAI in frame-based syntax |
employeeCompensation : forBusiness. rent : forBusiness. robot : forBusiness. foodbank : donation. politicalParty : donation.
?X: liability : − ?X: forBusiness. |
A program in the form of a query of the database in Listing 2 is in Listing 3. This listing shows three matches for the variable ?X.
| Listing 3: A program in ErgoAI in frame-based syntax |
ErgoAI> ?X : forBusiness. >> employeeCompensation >> rent >> robot |
Hence an ErgoAI system can prove employeeCompensation, rent, and robot all are forBusiness. We can also query the liabilities which gives the same output as the bottom three lines of Listing 3.
There are many rules that can be found in tax statutes. Many of these rules are complex. Currently there are about word instances in the U.S. Federal tax statutes and about word instances in the U.S. Federal case tax law. Of course most of these words are repeated many times, though they may be in different contexts.
4. Theorem Proving, Logic Models, and LLMs
This section focuses on logical model theory as it applies to LLMs along with theorem-proving. This section wraps up by discussing how logic programming relates to first-order logic semantics.
Definition 12 (Theory for a model)
. Consider a first-order logic language and a model for , then ’stheory
is,
Given a model of a first-order language , then the theory is all true sentences from for the model .
In the case of LLMs, tokens are translated into vectors (embeddings) in high-dimensional space. Tokens represent words, word fragments, or punctuation symbols. Each feature of a token has a dimension. In many practical cases there are thousands of dimensions [
46]. Similar token vectors have close semantic meanings. This is a central foundation for LLMs [
47,
48].
Semantics in LLMs is based on the context dependent similarity between token embeddings [
47]. These similarity measures form an empirical distribution. Over time, such empirical distributions change as the meanings of words evolve. We believe our users will use contemporary jargon to enter their legal questions. Contemporary jargon’s semantics may be different from the semantics of the laws when the laws were enacted. See Assumption 1.
Given two embeddings x and y both from the set of all embeddings V and a similarity measure . So values close to 1 indicate high similarity and values close to indicate embeddings with close to the opposite meanings. The function s may be a cosine similarity measure, for instance. In any case, for an embedding x suppose there is a set of embeddings where for all , then y is similar enough to x so y can be substituted for x given a suitable threshold for a particular context.
The expression indicates semantic closeness by adding that close token embeddings can be substituted for each other.
Definition 13 (Extended logical semantics for LLMs). Consider a first-order logic language and a set of first-order sentences with a model . Then iff all pairs , where M is ’s domain, have similarity that is above a suitable threshold.
In Definition 13, the expression
means given the model
, of similar domain elements, the sentences
are all true. The similarity score can be computed with the values in a feature table such as in
Figure 3.
Consider a first-order language
, a sentence
, and a similarity function
s. The next implication may fail to hold,
Here is an example when this implication fails. Suppose, the similarity function
s is so that
for some
and
where
. This can happen for the first-order logic formula,
For the formula to be true, there must be least two distinct items in the domain of .
Consider the tax deductibility of salary and bonuses for compensation. Suppose both salary and bonus are distinct inputs for the domain of the model . Further these are the only two elements in . A similarity function s may make it so that in preperation for theorem-proving with . So there is a single meaning for both salary and bonus that is fed into the domain of the model . That is, there is a single element in . Indeed, in since salary and bonus are the same, so This means, and since cannot be true for a domain of a single element.
There are ways to handle such similarity functions by sacrificing relations between models. For example, consider a sentence
where
f has
n arguments, then construct
that has
arguments. This allows us to use a similarity function
s so that
corresponds with
. Particularly, if
, then we can have,
Since and the models and are not being compared under the same conditions.
4.1. How the Neural Tax Networks System Works
Our system requires translation of natural language tax law to first-order logic or ErgoAI Horn-clause programs.
Definition 14 (Knowledge Authoring). Consider a set Law, of tax laws and clarifications in natural language, then finding equivalent first-order logic rules and facts is knowledge authoring. Performing knowledge authoring and placing the generated rules and facts in a set R is expressed as, Law → R.
Automated knowledge authoring is very challenging [
50]. We do not have an automated solution for knowledge authoring of tax law, even using LLMs. Consider the first sentence of U.S. Federal Law defining a business expense [
51],
“26 §162 In general - There shall be allowed as a deduction all the ordinary and necessary expenses paid or incurred during the taxable year in carrying on any trade or business, including—”
The full definition, not including references to other sections, has many word instances. The semantics of each of the words must be defined in terms of legal facts and rules.
A tax situation is entered by a user through the front end. The tax queston and facts are passed to the backend. The backend translates the tax question and facts into ErgoAI. In response, the proofs are sent to the front end and presented.
Our goal is to have users enter their questions in a structured subset of natural language with the help of an LLM-based software bot. This subset of natural language will be using contemporary jargon. The user input is the set of natural language expressions U. The set U contains user supplied facts and a tax question. The set R is made of ErgoAI facts, rules, and queries. It is consistent. The basis of R is the set Law of tax laws and clarifications. The set R will be built using knowledge authoring, perhaps with a great deal of human labor. An LLM will help map these natural language statements and a tax question in U into ErgoAI facts and queries. These facts and queries must be compatible with the logic rules and facts in the set R. We have not yet settled on how to leverage AI to map from user entered natural language tax questions into ErgoAI facts and queries.
Definition 15 (Determining facts and queries for R). Consider natural language facts and a tax question in a set U. We write for the map of U into into the set H of ErgoAI expressions that are compatible with the ErgoAI facts and rules R of the set Law. The set H is a tax query along with the relevant facts.
The expression indicates the model for the ErgoAI expression makes the ErgoAI expression H true. The operator allows similar domain elements to be substituted for each other. We can run an LLM to assist in generating H.
Using LLMs, Definition 15 depends on natural language word similarity. The words and phrases in user queries must be mapped to a query and facts compatible with our ErgoAI rules R so ErgoAI can work to derive a result.
Definition 14 and Definition 15 serve as a foundation for the next definition.
Definition 16.
Consider the ErgoAI rules and facts R representing the set Law and the compatible ErgoAI user tax question has a query and facts . The query and facts are written as first-order logic sentences in H where since they must be compatible with the rules R. Now a theorem prover such as ErgoAI can determine whether, .
Figure 1 shows a high-level vision of the Neural Tax Networks CTP system. Currently, we are not doing the LLM mapping automatically.
Figure 1 depicts knowledge authoring as
Law →
. Since
Law is the set of natural language laws and clarifications, they are based on the semantics of when the said laws were enacted by Assumption 1. Next, natural language facts and a tax question are placed in a set
U. We expect the natural langauge in
U to be in contemporary jargon. Then
U is converted into first-order ErgoAI expressions in a set
H which are compatible with
R. That is, we will use an LLM to assist constructing
H where
holds. Finally, a theorem-proving system tries to find a proof
, if
H is provable. A new set
H will be computed each time a user asks a new tax question.
The boxes in
Figure 2 indicate parts of our processing that may be repeated many times in one user session. Always in the order: a user enters their facts and queries in
U and then the system uses an LLM to help compute
H so that
. Next, the system attempts to prove
. This figure highlights knowledge authoring, the semantics of LLMs, and syntatic provability.
Suppose
H is created so that
. Theorem 3 indicates that if
, then
. In other words, if
R is valid in the theorem-proving model
and if
H is constructed so that
, then
both hold.
To validate our system we have built a number of functional tests. These mimic user entered tax situations where the tax answers are known. As we add LLMs, we will add testing with similar words and measure how the system performs. These measurements will first verify that the proofs, presented on the front end, are correct.
4.2. ErgoAI Examples
This section gives some tax-law related examples of ErgoAI. We use first-order logic capabilities for Horn-clauses in ErgoAI in frame-mode.
If there is an ErgoAI rule or axiom:
so that
E is the head and
B is the body. Where
E holds if
B is true. Such rules are all in the set
R.
The expression indicates all variables , for , are class instances of B.
The model
has domain
M, so
Given
and
, the next statement is true:
That is, the interpretation
makes
and
true. Hence
is a model for
. Alternatively consider the model
and its domain
then
is true. This is because there is no ?X in the model
so that ?X:forBusiness in
.
A simplified example for deducting a lunch expense highlights key issues for computing with LLMs. Consider a model
and its domain
M,
This is a model for a lunch expenditure. Features of salads and burgers can include: cost, calories, health-scores, and so on. These food features are in a feature table such as
Figure 3. But, a
zillion-dollar-frittata, costs about 200 times the cost of a salad or burger. So the
zillion-dollar-frittata is not an ordinary expenditure. For example, the
zillion-dollar-frittata property or field ordinary disqualifies it from being deductible. All three of these
salad, burger and
zillion-dollar-frittata may have some close dimensions, but they are not that similar since their cost dimension (feature) is not in alignment.
5. Löwenheim–Skolem Theorems and Elementary Equivalence
This section gives the main results of this paper. Its final subsection gives highlights of the Neural Tax Networks system.
Our theorem-proving system syntatically proves valid theorems in first-order logic. These results suppose tax law and clarifications are in a set Law. The set Law is translated to first-order logic suitable for a legal theorem-proving system, such as ErgoAI.
We assume the semantics of the law has an at most countable set of meanings from when the laws were enacted. This is assuming originalism for the legal rules in a theorem-proving system for first-order logic. The culmination of this section then shows: if new semantics are introduced by LLMs, then a first-order theorem-proving system will be able to prove the same results from the original semantics as well as compatible new semantics introduced by the LLMs. The defeasible reasoning in theorem-proving systems such as ErgoAI, may be able to help when there is a semantic mismatch.
In traditional logic terms, first-order logic cannot differentiate between the original theorem-proving semantics and the new LLM semantics, presuming the new semantics introduced by the LLM are compatible with the semantics for the theorem-proving system. Of course, since Horn-clause logic is a subset of first-order logic it also cannot differentiate between the original theorem-proving semantics and the new LLM semantics.
Theorem 5 (Special-case of the upward Löwenheim–Skolem [
43])
. Consider a first-order set of formulas . If has a countably infinite model, then has an uncountable model.
Michel, et al. [
53] indicates that English adds about 8,500 new words per year. See also Petersen, et al. [
54]. Originalism requires the retention of the meanings of words from when laws are passed. The meanings, from different eras, may not remain in common use, but these meanings remain available for lookup. So, we assume words or meanings do not leave a natural language, rather they may fade from common use. These word meanings all remain a part of natural language given a suitable context. Since we are proposing automating semantic lookup, word meanings must be maintained by context. Particularly, this can be done so we can perform legal reasoning assuming originalism. Also a word may change its meaning in common use while the word remains in common use but just with a different commonly-understood meaning.
A meaning is represented by a set of feature values. For example,
Figure 3 shows meanings represented by feature values in each row of a feature table. Each row represents one word, token, or atom. In discussing meanings from a language perspective, we use words for the rows of the feature table. For LLMs the rows in the feature table are tokens. Feature tables for theorem-proving systems have atoms in their rows.
Figure 3.
A subset of a feature table showing several words and a few features
Figure 3.
A subset of a feature table showing several words and a few features
To keep our discussion simple, we will say each row represents a natural language word. If a word has two meanings, then it will be represented in two rows and . Each row has a different meaning. Theorem 6 is based on adding new words with new meanings, adding new meanings to current words, or just adding new meanings. There are several ways we can represent new meanings: (1) a new meaning can be represented as a set of features with different values from any existing row, or (2) a new meaning may require new features.
We assume there will always be new meanings that require new features. So, we assume the number of feature columns is countable over all time. Just the same, we assume the number of words an LLM is trained on is countable over all time. In summary, over all time, we assume the number of rows and columns in this figure are both countably infinite. Words that are synonyms have the same feature sets, but these words will be in different rows. There may even be features based on etymology or spelling. Homonyms are each individually listed with their different feature sets since homonyms have different meanings.
The number of meanings is uncountable based on the next argument. Over all time, we can diagonalize over all words and and all features.
Figure 3 shows a word
whose features must be different from any other word with a different meaning. Therefore, any new word or meaning or an additional meaning for a word is
. So
is a new word or new meaning and it will never have the same feature values of any of the other words or meanings. There may even be new meanings that are not associated with any word. So, by diagonalization, if there is a countable number of feature columns and a countable number of rows, there must always be additional meanings not listed.
In the case of Neural Tax Networks, some of these new words represent goods or services that are taxable. Ideally, the tax law and clarifications will not have to be changed for such new taxable goods or services. These new word meanings will supply new logical interpretations for tax law.
Assumption 2. In natural language some words will get new meanings over time. Also new words with new meanings will be added to natural language over time.
For LLMs, assuming languages always add new words and meanings over time, then it can be argued that natural language has an uncountable model if we take a limit over all time. This is even though this uncountable model may only add a few expressions related to tax law each year. To understand the limits of our multimodal system, our assumption is that such language growth and change goes on forever. Some LLMs currently train on word instances or tokens. This is much larger than the number of atoms used by many theorem-proving systems. Comparing countable and uncountable sets in these contexts may give us insight.
Assumption 1 states the original meaning of words for our theorem-proving system is fixed. In other words, the meaning of the words in law is fixed from when the laws were enacted. We assume originalism since we recognize that the meaning of words can change over time. An example is provided in Reading Law: The Interpretation of Legal Texts by Scalia and Garner [
40]. Under the section entitled “Semantic Canons/Fixed Meaning Canon” Scalia and Garner note that the meaning of words change over time. Furthermore their meanings often change in unpredictable ways. They give as an example the statement attributed to Queen Anne about the architecture at St. Paul’s Cathedral was “awful, artificial and amusing.” By “awful” she meant “awe inspiring.” This contrasts with how the word “awful” is typically used today, in which it does not convey a positive impression of something but instead connotes a negative feeling about the thing being described. Thus, as Scalia and Garner state, it would be quite wrong to ascribe the Queen’s 18th century statement about the architecture of St. Paul’s Cathedral the 21st century meaning of her words.
Although this is a somewhat extreme example, it clearly shows that to properly determine the meaning of the words of a statute (and thus apply the statue in accordance with its terms) the statute must be interpreted and applied using meaning of the words at the time the statute was written. This is because originalism is the only way to determine what was intended by the legislative body that enacted the statute.
Homonyms can arise over time by having words used for one meaning at a particular point in time take on a second meaning when used in other contexts. The meanings of such words in legal writing can be determined by application of the “whole text” canon discussed by Scalia and Garner[
40], which requires the reader to consider the entire text of the document in which the word is used in order to determine its meaning. This is done in conjunction with the “presumption of consistent usage” canon, which states that a word or phrase is presumed to bear the same meaning throughout a text, and that a material variation in terms suggests a variation in the meaning. This is Salia and Garner’s “Presumptive of Consistent Use” canon, [
40, p. 170]. Taken together, these rules of statutory interpretation and application allow an expert system of the type being developed to use the capabilities of LLMs to determine, based on context, potential dual meanings. This will determine with an extremely high degree of accuracy what meaning should be assigned to a word. By way of example, the sentence “the light poll is bent and must be replaced” can readily be distinguished from the sentence “the voting poll closes at 8 PM.” This is only by the use of a different adjective immediately before the word “poll” but also by the use within the same sentence of the word “bent” (in the case of the “light poll”) and “closes” (in the case of the voting poll).
5.1. Model theory
Theorem 1 shows context-free grammars can specify a countable infinite domains for theorem proving systems. This is by constructing a countable number of atoms, program-terms or expressions. Using a similarity measure s suppose each word or token vector x has a subset of similar tokens where , for a constant integer .
Theorem 6. Taking a limit over all time, LLMs with similarity sets of constant bounded sizes have meanings.
In some sense, Theorem 6 assumes knowledge will be extended forever. This assumption is based on the idea that as time progresses new meanings will continually be formed. This appears to be tantamount to assuming social constructs, science, engineering, and applied science will never stop evolving.
The next definitions relate different models to each other. The term ‘elementary’ can be interpreted as ‘fundamental.’
Definition 17 (Elementary extensions and substructures [
41])
. Consider a first-order language . Let and be models of and suppose .
Then is anelementary extension
of or iff every first-order sentence is so that
Also, if , then is anelementary substructure
of .
As before, presumes the semantics of coincides with the equivalent semantics in .
Definition 18 (Elementary equivalence [
41])
. Consider a first-order language . Let mean and are elementary equivalent
models of . Then iff every first-order sentence is so that
Given a model of a first-order language, then is the first-order theory of . See Definition 12.
Theorem 7 (Elementary and first-order theory equivalence [
41])
. Consider a first-order language and two of its models and , then
A special case of the Lefschetz Principle of first-order logic [
5,
57] states: The field
of solutions of polynomial equations whose coefficients are from
and the field of complex numbers
have an elementary equivalence as described in Theorem 7. The field
contains the roots of all polynomials with coefficients from
. So,
contains algebraic numbers that may be complex. There is a countable infinite number of polynomials with coefficients from
, hence
. There is an uncountable number of complex numbers
that are not roots of polynomial equations with coefficients from
. so
. Recall
.
This means sentences that are true in
with cardinality
are also true in
with cardinality
. That is, by Theorem 7, we have
Theorem 7 indicates if there is an elementary equivalence between and with respect to the legal rules and facts, then they both models support the same legal theory. This theorem also illuminates our system. Suppose a user input U is compatible with first-order logic rules and regulations R of our theorem-proving system. The first-order facts, rules, and an intepretation are in the set . LLMs may be applied to help build the ErgoAI set H where . The ErgoAI in H is computed with (e.g., cosine) similarity along with any additional logical rules and facts. Assumption 2 indicates there is a countable model for H.
The next version of the Löwenheim–Skolem Theorems is from [
41,
55,
56]. See also [
52].
The symbol is a cardinal number. Cardinal numbers represent the cardinality of sets. For instance and are two possible assignments for .
Theorem 8 (Löwenheim–Skolem (L-S) Theorems). Consider a first-order language .
Upward The model is infinite and , then there is a model so that and .
Downward The model is infinite and , then there is a model so that and .
Theorem 8 shows the existence of elementary extensions and elementary substructures. It does not give effective methods to generate these relationships. Therefore this theorem is stated in great generality, not to mention it is expressed using infinite sets.
A first-order language with an countably infinite model can suitably encode tax law and clarifications from the set Law. This is because tax law is written down. Thus, it must be countable, even considering that over-all time, the set Law may become of at most countably infinite cardinality.
The next corollaries applies to tax law as well as other areas. There are two cases of interest:
Both and . This assumes a logic model with a countable number of meanings and atoms by Theorem 1. In logic terms, this model’s there are a countably infinite number of constant domain elements in . It also assumes LLMs with only a countably infinite number of meanings and tokens.
and . This assumes the model with an uncountable number of meanings for a countable number of atoms. Likewise, it assumes the model for LLMs with an uncountably number of meanings by Theorem 6 for a countable number of tokens.
The type of theorem-proving resolution or SLD-resolution has no impact on the next results. The central focus is on the models.
Corollary 1 (Application of Upward L-S). Consider a first-order language with a infinite model for a first-order logic theorem-proving system where . Then there is a model so that and where .
Proof. Apply Theorem 8 (upward) to and where . Since we assume both of these models share from the language .
The upward L-S theorem indicates there exists a model so that . □
Applying Corollary 1 requires so the semantics of carries over to . We express this by saying both and share from .
The case of interest to tax law is,
and so . This assumes logic model with only a countable number of meanings and atoms by Theorem 1. Alternatively, in logic terms, this model’s constant domain elements are at most countably infinite. It also assumes LLMs with an uncountably number of meanings by Theorem 6.
Corollary 2 (Application of Downward L-S). Consider a first-order language and model so that for a LLM passing its output to a first-order theorem proving system. Then there is a model so that and where .
Proof. Apply Theorem 8 (downward) to and where Since we assume both of these models share from the language .
Suppose we have an uncountable model
based on the uncountability of LLM models by Theorem 6. That is,
The downward Löwenheim–Skolem Theorem indicates where and . This completes the proof. □
To apply this corollary, we assume and share from .
Continuing the case with ( or ) and . There is an equivalence between an LLM’s uncountable model and a first-order countable or uncountable logic model. This equivalence is based on the logical theory of each of these models. In other words, a consequence of Theorem 7 is next.
Corollary 3.
Consider a language and the models and , where and and share σ from . Then
This is the ideal case where and support the same theories.
5.2. Certifiable Tax Prover architecture sketch
This subsection gives a basic architectural sketch of our proof-of-technology CTP.
Figure 4.
Client-server interaction for the CTP system
Figure 4.
Client-server interaction for the CTP system
CTP is built on a cloud-based client-server architecture. The UI is a thin client since only user data entry and authentication is done on the client side. The computation is all on the server side. The server side is built as a set of microservices. The microservices will include,
- 1.
User roll management
- 2.
LLM AI chat assistant
- 3.
Tax law natural language text server
- 4.
Tax law query builder
- 5.
ErgoAI proof engine
Currently, our CTP has a basic highly structured user interface that allows a user to enter facts. The front end translates these facts into JSON. The JSON is sent to the backend which builds queries with the user-entered facts.
The proofs are computed by ErgoAI on our backend. Our backend is in the cloud.
Figure 4 provides a sketch the main client-server exchange.
The front end is a React UI system. This React UI has users fill out details of their tax questions. We will have a bot that assists the user. This bot will be managed from the backend. The bot will work with the user on the front end to enter contemporary jargon in to the CTP system. This contemporary jargon will be passed to the backend so it can be translated into suitable language for the theorem-prover.
The backend runs Java in a Spring-boot server. This server hosts all of the microservices. This Spring-boot server is running Tomcat. Currently a single service of the Tomcat server translates the JSON from the front end into ErgoAI. Then this same service calls the ErgoAI theorem prover on the inputs.
The Spring-boot server backend receives queries in JSON from the front end. These JSON queries are mapped to ErgoAI and executed by the Java backend. The answers are passed back to the React front end and presented using React. In the future, we will separately build the tax law query in ErgoAI in one service and search for the proof in another service.
Of course, our architecture must deal with the semi-decidability of first-order logic or first-order Horn-clause logic. See Theorem 2. ErgoAI has several features to lower the chances of infinite cycles while trying to prove false sentences. Nonetheless our system must handle any, unplausible situation, gracefully.
6. Discussion
This paper aims to give a better understanding of multimodal AI. Particularly, the two modal systems discussed here are LLMs and first-order logic theorem-proving systems. The first-order theorem-proving systems we are using are from ErgoAI or Prolog. We discuss general features of LLMs.
Often mathematical limits are over infinite domains. For example,
, for a mathematical function
. In discussing LLMs, our limit is over the infinite time domain.
Figure 6 shows this limit assuming an annual linear increase of meanings of about 8,000 new meanings per year. This infinite time domain goes forever forward in time. The purpose of this discussion is to give better understand and reason about the multimodality of our system. This is just as the foundations of computer science has given us a much better understanding of physical computers. Where many times the foundations of computer science assume machines with arbitrarily large or infinite capacity.
Even though all humans are mortal, these results may give insight to their experience due to the domains of the two modalities of our multimodal system. It is germane, since LLMs train on very large data sets. Currently, these data sets can be as large as word instances or tokens. In contrast, the number of facts and rules in U.S. tax law and clarifications, Law, presently requires about word instances.
Consider an uncountable model or number of words or meanings from LLMs by Theorem 6. Corollary 2 indicates such an uncountable model can work with a theorem-proving system using a countably infinite model. This is very interesting in light of originalism for the law in Assumption 1. Furthermore, a few of the rules and principles of tax law are similar to those from 1,900 years ago. This indicates some semantics remains over a long time, even in different languages and in different eras.
Some LLMs have more than ten thousand dimensions for their feature sets. Although the number of dimensions for feature sets has been growing with new LLMs, there is also research to constrain the number of dimensions. Consider this in contrast to our assumption of countably infinite feature sets.
To apply the logical model theory, when the two models are related as , then the semantics of must coincide with the semantics of . Defeasible reasoning may be applied during theorem-proving to essentially enhance semantic compatibility.
One classical interpretation of the upward Löwenheim–Skolem Theorem is that first-order logic cannot distinguish higher levels of infinity. That is, first-order logic cannot differentiate sets with distinct cardinalities of
, for
. It is also widely discussed that first-order logic cannot make statements about subsets of its domain [
59]. This has impact on differentiating LLMs and theorem-proving systems assuming originalism.
Our use of the Löwenheim–Skolem Theorems and our discussion of Lefschetz Principle sheds light on originalism.
We also mention the classical notion that there is a countable number of algorithms that can be listed. Yet, it is possible to diagonalize over these algorithms showing there must be algorithms not in this list. This discrepancy is rectified by results based on uncomputable numbers such as part 2 of Theorem 2. Meanings, in the sense we discuss here, may not tie into algorithms.