Engram: A Systematic Approach to Optimize Keyboard Layouts for Touch Typing, With Example for the English Language

Most computer keyboard layouts (mappings of characters to keys) do not reflect the ergonomics of the human hand, resulting in preventable repetitive strain injuries. We present a set of ergonomics principles relevant to touch typing, introduce a scoring model that encodes these principles, and outline a systematic approach for developing optimized keyboard layouts in any language based on this scoring model coupled with character-pair frequencies. We then create a keyboard layout optimized for touch typing in English by constraining key assignments to reduce lateral finger movements and enforce easy access to high-frequency letters and letter pairs, applying open source software to generate millions of layouts, and evaluating them based on Google’s N-gram data. We use two independent scoring methods to compare the resulting Engram layout against 10 other prominent keyboard layouts based on a variety of publicly available text sources. The Engram layout scores consistently higher than other keyboard layouts.

example, some keyboards are split into left-and right-hand sides and angled to reduce bending of the wrists (Zipp et al., 1983;Çakir, 1995;Zecevic et al., 2000;Rempel et al., 2007;Rempel et al., 2009), are rounded to conform to the shape of the hand (Gerard et al., 1994;McLoone et al., 2009), position high-access keys in the middle for easy reach by the thumbs (Gerard et al., 1994), and arrange keys into perpendicular rows and columns to reduce diagonal finger movements (see Figure 2). Some also permit a choice of key switches, whose force-displacement characteristics can impact strain and fatigue (Rose, 1991;Rempel et al., 1997;Radwin et al., 1999;Bufton et al., 2006;Lee et al., 2009). The Kinesis Advantage (Gerard et al., 1994) and the Ergodox are two examples of commercial keyboard designs that also permit remapping of characters to individual keys, and therefore enable completely customizable keyboard layouts.
Since the vast majority of people simply use the keyboard bundled with their computer or physically integrated into their laptop, adopting a better keyboard layout has the greatest potential to significantly improve comfort and reduce strain for the greatest number of people who do touch typing. And to counter the resigned statement "It's too difficult for people to switch keyboards," it is important to recognize: (1) there are millions of people for whom it would not be a switch, including every new generation, (2) many languages do not yet have a well-established keyboard layout, and (3) people who suffer or do not wish to suffer repetitive strain injuries from typing but need to type have vested interest in improving the ergonomics in their lives.

Prominent keyboard layouts
Since the invention of the typewriter, the Sholes ("Qwerty") keyboard layout has been the preeminent layout, despite being generally acknowledged as inferior to many of its competitors (Amell and Kumar, 2000). For a thorough account of the history of keyboard layouts, see ( Martin, 1981;Noyes, 1983) . The Qwerty layout and its cousins were designed without apparent regard for efficiency to touch type common bigrams (two-letter sequences) in English. New layouts have been introduced to allow for more efficient touch typing in English over the last century (see Table  1), but most did not attempt to optimize their layouts. Indeed, for the sake of making it easier to switch from touch typing with a Qwerty layout, most retain elements of Qwerty to make them more efficient for Qwerty users to learn. Exceptions include layouts generated by the CarpalX's simulated annealing of typing effort models ( Krzywinski, 2015) and Halmak's genetic algorithm ( Nemshilov, 2016), both of which are included in our analysis . In the present study, we are not concerned with what a user has already learned and used in the past, but focus on designing new, optimal keyboard layouts for standard and orthonormal keyboard designs. Our principal concern is comfort, but we hypothesize that careful, systematic choices that account for basic ergonomics principles of finger length, strength, and movement that reduce strain can also improve efficiency, in terms of typing speed and accuracy. Dvorak  ) undertook one of the most comprehensive reevaluations of the keyboard layout without attempting to optimize, and defined eleven criteria for the design and evaluation of keyboard layouts, which can be summarized as follows: (1) Alternate between hands.
(2) Balance finger loads, and avoid using the same finger.
(3) Avoid the upper and lower rows, and avoid skipping over the home row.
(4) Avoid tapping adjacent rows with the same or adjacent fingers.
Most keyboard layouts that purport to improve typing efficiency, including Dvorak's, demand undue strain on tendons, particularly lateral extension of the index and little fingers. The index fingers must reach to the center columns to access letters, and the right little finger must reach to the periphery of the keyboard to access various punctuation. Following Dvorak, many layouts over-emphasize alternation between hands and under-emphasize same-hand, different-finger transitions. However, same-row, adjacent-finger transitions are arguably easier, more comfortable, and faster than alternating hands to type keys that are far removed from one another. Many layouts also ignore the ergonomics of the human hand: different finger lengths and strengths, roundedness of the hand, relative ease of little-to-index finger roll-ins vs. index-to-little finger roll-outs, etc. What is of greatest concern is that most proposed layouts are not based on established open access data, do not provide reproducible evidence for their superiority, and are not published in a peer-reviewed journal for scrutiny by the scientific community. When evaluations are conducted, they are often restricted to comparisons against the Qwerty layout and not to layouts that are known to be more efficient or redesigned based on ergonomics principles.

Data on typing strength and speed
Data on finger strength and typing speed can be useful for informing the design of a keyboard layout, if the data is relevant and applied appropriately. Table 1 of an article describing the design of a keyboard layout for the Filipino language (Salvo et al. 2016) presents the "average finger strength of Filipinos [n=30, ages 16-36] measured in pounds": the index, middle, ring, and little fingers on the right hand were measured to produce 6.09, 6.37, 5.08, and 4.27 pounds of force, respectively, and on the left hand 6.57, 5.65, 4.54, and 3.77 pounds. However, these measurements probably don't represent relative strength relevant for typing: "Respondents were asked to sit in upright position, with their wrists resting on a flat surface. A pinch gauge was placed within each finger's reach. The respondents were asked to exert maximum pressure on the device." Martin (Martin et al. 1996) used finger flexor electromyograms to measure finger strengths. From Table 4 of their publication, the index, middle, ring, and little fingers were measured to produce 2.26, 2.36, 2.02, and 1.84 Newtons, respectively. These were based on peak keyboard reaction forces, however, and may not reflect the amount of force a typist would normally or should apply when typing. We factor in strength data in our study primarily for initial placement of the most frequent letters.
Likely the most relevant study regarding typing speed for keyboard layout optimization was conducted by İşeri and Ekşioğlu (2015). While the study generated data on typing speed, care must be taken when using these measurements, as they were collected from right-handed typists on a conventional QWERTY keyboard, without regard to the order in which pairs of letters were typed. We factor in speed data in our study for early experimentation and validation, and we averaged data from the left and right hands to compensate for right-handedness of their study participants.

Letter, n-gram, and punctuation frequency
Perhaps the most extensive analysis to date of n-gram frequencies (where an n-gram is a string of n adjacent letters) on the largest general English text corpus (maintained by Google) was conducted by Peter Norvig (2012).  Table 2 aggregates the results of different studies that measured the frequencies of punctuation marks in publications and communications in English and in software programs. The table includes the 12 most frequent punctuation marks from each of the studies described below.  Sun, et al. (2018) published statistical values of punctuation frequency in 20 English-speaking countries from large-scale text corpora. The data were acquired through GloWbE, "a large English corpus collecting international English from the internet, containing about 1.9 billion words of text from twenty different countries. For further information on the corpora used, see https://corpus.byu.edu/." Table entries are average frequencies per one million characters. Malik and Findlater (2013) analyzed the frequency of punctuation input on touchscreen keyboards using Twitter as compared against Google N-grams. They found that punctuation in mobile tweets comprised 7.5% of characters versus only 4.4% in the Google corpus. Only six punctuation symbols appeared more frequently than the letter Q in the Google corpus, and the comma was not included. Table entries are in percentage of total characters from the Google N-gram corpus (version 1 containing 472,764,897 characters in English books published between the years 1538 and 2008), and from 173,876 mobile tweets (uniformly sampling 1% of the public tweet stream from June 2012). Cook (2013) published on the frequencies of English punctuation based on a corpus of about 459,000 words, including three novels (276,000 words), selections of articles from two newspapers (55,000 words), one bureaucratic report (94,000 words), and assorted academic papers on language (34,000 words). Table entries are average frequencies per 1,000 characters. Ruhlen and Pressey (1924) published a statistical study of what was then current usage in punctuation. Table entries are in frequencies per 10,000 words, drawn from 38,638 words from 100 business letters, 50 professional letters, and excerpts from one issue each of several newspapers and magazines. While it is limited in scope, it provides confirmation of later and larger studies. Lee (2013) published online an analysis of the frequencies of punctuation in different software programming languages. Table entries are percent frequencies across all programming languages they included in their study, as well as the popular C and Python programming languages. While some of the punctuation is biased toward C (19.8%) and Python (18.5%), which make considerable use of the underscore, there is some consistency across languages for the most frequent punctuation.

A new layout
In this work, we outline a set of ergonomics principles related to touch typing, apply a systematic approach to designing optimal keyboard layouts that follow these ergonomics principles, and provide open source software for generating optimal keyboard layouts based on this systematic approach. To generate an example layout for optimal touch typing in English, we made use of publicly available, massive bigram frequency data for the English language (Norvig, 2012). The result is the "Engram" layout ( Figures 1 and 2), available for anyone to use on multiple platforms via the Keyman application. We compare this layout with the layouts in Table 1. (The name "Engram" is a pun, referring both to "n-gram", letter permutations and their frequencies that are used to compute the layout, and "engram", or memory trace, the postulated change in neural tissue to account for the persistence of memory, as a nod to the attempt to make this layout easy to remember.)

Methods
In this section, we will introduce a set of ergonomics principles for touch typing, define the general organization of a keyboard layout to minimize lateral finger movements, and suggest arrangements for the most frequent letters in English to keys on a keyboard based on comfort and bigram frequencies. We then present our Engram scoring model to optimize the arrangement of the remaining letters. After assigning non-letter characters to the remaining keys, we describe stability tests and a comparison of our resulting layout against other layouts listed in Table 1.

Ergonomics principles for touch typing
Inspired by Dvorak, but determined to overcome its drawbacks, we introduce a set of 12 ergonomics principles for touch typing, interspersed with the design decisions and scoring model parameters that we use in the present study and detailed below.

Ergonomics principles with design decisions (-) and scoring model parameters (+):
1. Assign letters to keys that don't require lateral finger movements.
-24 letters constrained to home columns (2.2.1) 2. Promote alternating between hands over uncomfortable transitions with the same hand.
-vowels and most common consonants separated to either side (2.3.1, 2.3.3) + scoring model parameters do not penalize bigrams that cross sides (2.4.1) 3. Assign the most common letters to the most comfortable keys.
-most common letters in the home, top-center, and bottom corner keys (2.2.2) -most common vowel in the strongest left finger position (2.3.1) -most common consonants in the strongest right finger positions (2.3.3) + implicit in the scoring model (2.4.1) 4. Arrange letters so that more frequent bigrams are easier to type.
-highest-frequency bigrams arranged on the most comfortable keys (2.3.4) + basis of the scoring model (2.4.1) 5. Promote little-to-index-finger roll-ins over index-to-little-finger roll-outs.
-highest-frequency bigrams ordered for roll-ins from little to index finger (2.3.1, 2.3.3) + scoring model penalizes roll-outs ("rollout" parameter in 2.4.1) 6. Balance finger loads according to their relative strength. + scoring model optionally weights keys according to normalized finger strength (2.4.2) 7. Avoid stretching shorter fingers up and longer fingers down. + scoring model explicitly penalizes this ("out_hi_in_lo1", "out_hi_in_lo2" in 2.4.1) 8. Avoid using the same finger. + scoring model penalizes same-finger bigrams ("2x" in 2.4.1) 9. Avoid the upper and lower rows. + scoring model penalizes upper and lower rows ("not_home" in 2.4.1) 10. Avoid skipping over the home row. + scoring model penalizes bigrams with an upper and lower row letter ("skip" in 2.4.1) 11. Assign the most common punctuation to keys in the middle of the keyboard.
-placement accounts for frequency, logical grouping, and ease of recall (2.7) 12. Assign easy-to-remember symbols to the Shift-number keys.
-placement accounts for logical grouping and ease of recall (2.7) 2.2. Defining the shape of the keyboard layout to minimize lateral finger movements Before addressing the arrangement of characters on a keyboard, it is important to define the shape of clusters of characters so that finger movements are constrained in beneficial ways. To clearly describe our approach, we will refer to the numbering scheme below. The first nine column numbers equal the number keys 1 through 9 at top, and continue increasing from left to right, and row numbers increase from bottom to top. For example, the left index finger on the home row is situated above the key in the 4th column on the 2nd row (42), and the right little finger on the home row is situated above the key in the 10th column on the 2nd row (102). The numbering scheme is portrayed below in the orthonormal (perpendicular) layout of some keyboards for clarity, but this numbering scheme is perfectly applicable to standard diagonal keyboards as well. The eight "home columns" (1 through 4 and 7 through 10) contain the eight fingers' "home" keys (12, 22, 32, and 42 on the left and 72, 82, 92, and 102 on the right). We assign 24 of 26 letters to the keys in the 8 home columns, separated by two middle columns reserved for punctuation. These 8 home columns require no lateral finger movements when touch typing on an orthonormal keyboard, and minimal lateral movements on a standard, diagonal keyboard, since each column is accessed by one finger. Due to the natural roundedness of the hand, the most easily accessible keys from the home row (2) include the home keys, the top-center keys for each hand (23 and 33 on the left, 83 and 93 on the right) that allow the longer middle and ring fingers to uncurl upwards, as well as the bottom corner keys (11 and 41 on the left, 71 and 101 on the right) that allow the shorter fingers to curl downwards. We will assign the two least frequent letters, Z and Q, to the 11th column (keys 112 and 113).

Arranging the most frequent letters based on comfort and bigram frequencies
Given that there are over 600 sextillion (24! = 6.204484017 E+23) possible arrangements for the 24 letters in the home columns, which is currently computationally intractable, we will arrange the letters in stages, based on letter frequency and letter-pair (bigram) frequency in the English language, according to Norvig's analysis (Norvig 2012). To apply the following approach to optimize the layout for a different language, or optimize according to a different corpus than the one used by Google, you would run the open source software used in this study on your preferred bigram frequency data.
In prior experiments using the same methods as below but with fewer constraints applied, over 200 million different layouts were evaluated, and all vowels consistently automatically clustered together to the left side of the keyboard. Here we will initialize with vowels on the left side and the most frequent consonants on the right side to encourage balance and alternation across hands. We will reevaluate the layout and its mirror image when we consider keys 112 and 113, which break the bilateral symmetry of the layout of the letters.

Letters in descending order of frequency in English, highlighting vowels:
We first assign the five vowels (E, A, O, I, U) to the most easily accessible keys from the home keys on the left side (see 2.2.2). Given that the middle and index fingers are the strongest (Martin et al. 1996), we assign the letter E, the most frequent in the English language, to the middle or index finger position in the left home row (keys 32 and 42). Bigrams with more than 1 billion instances in Peter Norvig's analysis of Google data are: OU, IO, EA, IE , AI, IA, EI, UE, UA, AU, UI, OI, EO, OA, OE (bigrams with more than 10 billion instances are in bold). We will initialize the arrangement of the vowels so that the top four bigrams read from left to right (for example, IO with 23.5 billion instances vs. OI with 2.5 billion instances, and IE with 10.8 billion instances vs. EI with 5.2 billion instances). This will prioritize left-hand roll-ins from little to index finger over the slower, less comfortable roll-outs from index to little finger. These initial constraints result in six comfortable and efficient layouts (2.3.2).

Vowel layouts:
Left-hand keys: --23 33 --12 22 32 42 11 ----41 Next, to populate the home row on the right side of the keyboard, we examine all possible sequences of four letters from the eight most frequent consonants. Each of these consonants has at least 100 billion (at least 3% of) instances in Peter Norvig's analysis, and with the vowels, they cover half the alphabet.

Letters in descending order of frequency in English, highlighting high-frequency consonants:
These eight consonants are also included among the top-frequency bigrams, with more than 10 billion instances: TH, ND, ST, NT, CH, NS, CT, TR, RS, NC. To facilitate typing these bigrams with 4 fingers on the right-hand home keys, we select 4-consonant sequences that contain at least four top-frequency bigrams, such as NRST (NS, NT, RS, RT, ST). As with the letter E above, we assign the letter T, the second-most frequent letter in the English language, to the middle or index finger position in the right home row (keys 82 and 72) . The resulting 4-consonant sequences are: NSTH, NSTR, NCTH, NCTR, and NRST. The resulting six arrangements of five vowels on the left and five arrangements of four consonants on the right gives us 30 initial layouts, each with nine letters assigned to keys and 15 unassigned keys. In 2.3.4 below, the three rows on the left and right side of the keyboard are represented as a linear string of letters, with unassigned keys denoted by "-". All sequences on the right side will be reversed so that they read from right to left for ease of typing (right-hand roll-in from little to index finger vs. roll-out from index to little finger).

Engram scoring model
To optimize keyboard layouts, we need to score and evaluate different layouts using a scoring model. We have developed a scoring model that formalizes our ergonomic principles (2.1) for use in an optimization algorithm. Our optimization algorithm finds every permutation of a given set of letters (or, more generally, characters), maps these letter permutations to a set of keys, and scores these letter-key mappings. The score for a given layout is the average of the scores for all possible bigrams in the layout. The score for each bigram is a product of the frequency of occurrence of that bigram and flow (and optional strength and speed) factors. The flow factors only need to be computed once to optimize or evaluate any layout, assuming a consistent keyboard design. The optional strength and speed factors likewise only need to be computed once, but since they are empirically derived, are subject to change based on the representativeness of the data. As mentioned in the introduction, we factor in strength data (Martin et al., 1996) in this study for initial assignment of the most frequent letters to the home keys, and we factor in speed data in this study for early experimentation and validation based on averaged left-and right-hand data from İşeri and Ekşioğlu (2015). The Engram layout is based primarily on flow factors and bigram frequencies for key pairs. Since pairwise comparisons naturally lend themselves to a matrix representation, we will present matrix heatmap visualizations to indicate higher values as darker colors for pairs of keys in column-by-row positions in the heatmap. Density histograms present the number of key pairs for each value (out of 576 pairs for the 24 letters in the home columns). When computing scores for keyboard layouts in this study, we multiply a matrix of bigram frequencies with a flow matrix. We describe below optional strength and speed matrices that we only use for early testing and for validation.

Flow
Flow is a measure of the ease of a finger transition from the first in a pair of keys to the second. All of the difficult key positions and key transitions listed below lower the score by the same factor (0.9), except for typing two different keys in succession with the same finger (0.81). In one of our stability tests, we remove each individual parameter to evaluate how sensitive the winning layout is to that parameter. These parameters are not mutually exclusive, and accumulate as key positions and transitions become progressively more difficult (see below for examples).
Difficult key positions: • out_hi: shorter (index or little) finger on top row • in_lo: longer (middle or ring) finger on bottom row Difficult key transitions: • rollout: roll out from index to little finger (or other similarly outward directed pair) • 2x: use same finger twice for a non-repeating letter • out_hi_in_lo1: rindex above middle finger, or little above ring finger • out_hi_in_lo2: index above ring finger, or little above middle finger • ring_hi_mid_lo: ring above middle finger • not_home: at least one key not on home row

Strength
We define strength simply as the normalized empirical measurements of finger strengths from Martin, et al. (1996) without regard for the positions of these fingers. Therefore, the key pairs with the highest strength values are predictably all pairs that use either or both of the strongest (middle) fingers, and pairs with the lowest strength values are all pairs that use either or both of the weakest (little) fingers.

Speed:
We define speed as the left-right averaged measurements of interkey timings from İşeri and Ekşioğlu (2015) without regard for the order of the key presses. Interkey speed values decrease roughly in order from index to little fingers starting from the home keys. Not surprisingly, the key pairs with the lowest interkey speed values (21 to 13, 91 to 103) are the same key pairs with the lowest flow values. Figure 5 shows a heatmap visualization of the speed factors for all 576 pairs of the 24 keys in the home columns, and Figure 6 shows a frequency histogram of these speed values.

Optimizing assignment of the remaining letters
Using the above scoring model, we can evaluate different letter assignments for the 15 missing letters in each of the 30 fixed initialized layouts from 2.3.4. Since there are over 1.3 trillion possible ways of arranging 15 letters (15! = 1,307,674,368,000), we will break up the assignment into two stages. First we will compute scores for every possible arrangement of the seven most frequent remaining letters (2.5.1) in the seven most comfortable remaining key positions (2.5.2, from 2.2.2, including either 22 or 23 depending on the layout). Second, we will compute scores for every possible arrangement of the 8 least frequent letters (aside from Q and Z; 2.5.3) in the 8 least comfortable key positions (2.5.4), after substituting in the results from the first stage. For the 7 most frequent remaining letters, there are 7! = 5,040 possible permutations, and we have 30 initial layouts, so we need to score and evaluate 151,200 layouts. As mentioned above, to score each layout, we construct a frequency matrix of each ordered pair of letters based on the key assignments, and multiply this frequency matrix by a precomputed flow matrix. For the 8 least frequent remaining letters (aside from Q and Z ), there are 8! = 40,320 possible combinations, and we have 30 layouts, so we need to score and evaluate 1,209,600 more combinations.

Further optimization by exchanging letters
If we relax the above fixed initializations and permit further exchange of letters, then we can search for even higher-scoring layouts. As a final optimization step we exchange letters, eight keys at a time, to score a total of 21,772,800 more combinations:

Non-letter characters
In addition to the 26 letters, we will address the key assignment of non-letter characters, taking into account frequency of punctuation, logical grouping, and ease of recall.
We will assign the most frequent punctuation (from 1.3) to the six keys in the middle two columns: The Shift key will access related punctuation marks (shown as left and right pairs below). The left middle column will contain punctuation for separating and joining text and the right middle column will contain punctuation for closing text.

Separating marks (left middle column):
The comma separates text in lists; the semicolon can be used in place of the comma to separate items in a list (especially if these items contain commas); open parenthesis sets off an explanatory word, phrase, or sentence.

Joining marks (left middle column):
The apostrophe joins words as contractions; the hyphen joins words as compounds; the underscore joins words in cases where whitespace characters are not permitted (such as in variables or file names).

Closing marks (right middle column):
A sentence usually ends with a period, question mark, or exclamation mark. The colon ends one statement but precedes the following: an explanation, quotation, list, etc. Double quotes and close parenthesis closes a word, clause, or sentence separated by an open parenthesis.

Number keys:
We reserve the entire number key row to mathematical and logic symbols (  The three remaining keys in many common keyboards (flanking the upper right-hand corner Backspace key) are displaced in special keyboards, such as the Kinesis Advantage and Ergodox. For the top right key, we will assign the forward slash and backslash (/ \). For the remaining two keys, we will assign two symbols that in modern usage have significance in social media: the hash/pound sign (#) and the "at sign" (@). The hash or hashtag identifies digital content on a specific topic (the Shift key accesses the dollar sign). The "at sign" identifies a location or affiliation (such as in email addresses) and acts as a "handle" to identify users in popular social media platforms and online forums.

Stability Tests
We will run three stability tests on the winning layouts: 1. Compare score of the winning layout after rearranging random letters 2. Compare ranking of all final layouts based only on interkey speed 3. Compare ranking of all final layouts after removing each scoring parameter The first test is to see if allowing random sets of letters to rearrange in every possible combination improves the score of the winning layout. We repeat this test 1,000 times, randomly selecting eight of the 24 letters, and another 1,000 times, randomly selecting eight of the 16 letters in the non-home rows, for a total of 80,640,000 additional layout tests.
In the second test, we rescore all of the final layouts, replacing the flow matrix with the inter-key speed matrix to see if this affects their ranking. In the third test we remove each Engram scoring parameter one at a time and rescore all of the final layouts to see if this affects their ranking.

Comparing the resulting layout against other layouts
In all of the optimization and evaluation of layouts in the service of developing the Engram layout, we score layouts using Engram's scoring model and Google's bigram frequency data. However, as mentioned previously, we can use any bigram frequency data, including such data derived from any text corpus. To compare the winning layout against the layouts in Table 1, we have compiled a data set of large, representative, publicly available text data, with sources listed in Table 4 and available from a public GitHub repository (https://github.com/binarybottle/text_data). We will score each layout on Google's N-gram data and on each of the text sources in Table 4 using the Engram scoring model, and generated by the online Keyboard Layout Analyzer (http://patorjk.com/keyboard-layout-analyzer/) using its own much more simple scoring method: "The optimal layout score is based on a weighted calculation that factors in the distance your fingers moved (33%), how often you use particular fingers (33%), and how often you switch fingers and hands while typing (34%)." Table 4. Text data.

Results
In the results below, we computed the total score for a keyboard layout as the average of the scores for every pair of keys. For optimization of the arrangement of the letters in the Engram layout, we computed over 23 million scores for different arrangements of the 24 letters in the home columns. For comparisons against other keyboard layouts, none of which reserve all of the keys in the middle columns for non-letter characters, we instead computed scores for 32 keys, including the 24 letters in the home columns, the middle columns, and keys 112 and 113.

Optimized layouts
Of the 1,360,800 layouts generated from the 30 fixed initializations (2.5), the top-scoring layout contains a different sequence of home-key consonants than in any of the initializations: After permitting the further exchange of letters (2.6) and scoring 21,772,800 more layouts , 9 layouts had unique home-key letter assignments, with the top-scoring layout below:

Stability test results
3.2.1. Test 1 results: After repeatedly selecting random letters in the top-scoring layout, creating over 80 million layouts from every permutation of these letters, and computing their scores, we could not find a higher-scoring layout.
3.2.2. Test 2 results: Replacing the flow matrix with the inter-key speed matrix changes the ranking of the layouts, but the top-scored layout stays in top place, attesting to its efficiency with respect to typing speed. We included an additional test, where we multiplied the flow matrix by the finger strength matrix when scoring. The top-scored layout remained in first place once again.
3.2.3. Test 3 results: Removing each scoring parameter did not affect the top-scoring layout's rank, attesting to its robustness to parameter perturbations.

The winning Engram layout
These test results all corroborate the choice of the top-scoring layout. For the final letters Q and Z, our scoring model gave a higher score for Q above Z:  Table 5 contains scores computed for different 32-key keyboard layouts using Engram's scoring model and (i) Google's bigram frequency data (first data column), and (ii) bigram frequency data derived from each of the text sources in Table 4 (remaining columns). The Engram layout scored higher than any other layout for all text sources. Table 5. Engram scoring model scores for layouts using publicly available text data. Table 6 contains scores for the same layouts according to the online Keyboard Layout Analyzer. The Engram layout scored in first place for 6 of the 11 text sources: Alice in Wonderland (Ch. 1), Bhagavad Gita, 100,000 tweets, MASC tweets, COCA blogs, and Google website; it scored in second place to Dvorak for Romeo and Juliet and to QGMLWB for the 20,000 tweet dataset. The Dvorak layout scored higher than the Engram layout for three text sources, more than any other layout, according to the Keyboard Layout Analyzer's scoring algorithm. Table 6. Keyboard Layout Analyzer scores for layouts using publicly available text data.

Discussion
In this study, we presented a set of ergonomics principles relevant to touch typing on standard (diagonal) and specialized (orthonormal) computer keyboards. We encoded these principles in a scoring model as part of a systematic approach for developing optimized keyboard layouts. As an example application, we created the Engram keyboard layout optimized for touch typing in the English language using letter-pair frequencies from Google's N-gram data. We conducted a variety of tests to evaluate the robustness of the winning layout to perturbations in letter arrangements and scoring parameter settings. The winning layout is the outcome of careful initializations and scoring of over 100 million layouts. Finally, we used our Engram scoring model and the online Keyboard Layout Analyzer's simple scoring method to compare the Engram layout with 10 prominent keyboard layouts. For this comparison, we used a variety of publicly available text sources, including hundreds of thousands of tweets, spoken transcripts, etc. The Engram layout scored higher on all of these text sources than any other keyboard layout using the Engram scoring algorithm, and scored in first place for more text sources than any other layout using the Keyboard Layout Analyzer's scoring algorithm.
The Engram keyboard layout can be installed on Linux, macOS, and Windows (https://keyman.com/keyboards/engram) and the open source software can be used to create other optimized keyboard layouts in different languages or tailored to different character-pair frequency data (https://github.com/binarybottle/engram).

Acknowledgments
I would like to thank my family for supporting this endeavor, the Keyman tools and community for helping to make distribution of the Engram layout possible, and to that damned DEC workstation at the MIT Media Lab that introduced me to repetitive strain injury 25 years ago, which has prompted me over the years to experiment with voice dictation, one-handed and keyless keyboards, foot pedals, foot mouse, and wonderful keyboards like the Kinesis Advantage and Ergodox. I would also like to express my gratitude to all of my predecessors who have made valiant efforts to improve our relationship with computers by advancing the ergonomics of keyboard designs and keyboard layouts. This article was comfortably typed using the Engram layout on macos and linux laptops.

Funding details
This work was independently supported by the author.

Disclosure statement
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Data availability and deposition
All of the bigram frequency and timing data used to optimize the Engram layout accompanies the open source software in the engram public GitHub repository (https://github.com/binarybottle/engram). All of the text data used to evaluate the Engram layout is in the text_data public GitHub repository (https://github.com/binarybottle/text_data).