One of the minor features I’ve added to Flight Historian is country flags for tail numbers. Every aircraft is registered to one country, and each country has its own assigned format for tail numbers, so it’s possible to look at each tail number and determine what country it’s from.

Since this operation is matching a string to a pattern, it made sense to create regular expressions for each country. For most countries, whose tail number is a unique prefix followed by a dash and three or four letters, this was easy to do. But the United States rules for valid tail numbers are substantially more complicated.

## Valid US Tail Numbers

US tail number validity is defined by the Federal Aviation Administration (FAA):

N-Numbers consist of a series of alphanumeric characters. U.S. registration numbers may not exceed five characters

in additionto the standard U.S. registration prefix letterN. These characters may be:

- One to five numbers (N12345)
- One to four numbers followed by one letter (N1234Z)
- One to three numbers followed by two letters (N123AZ)
To avoid confusion with the numbers one and zero, the letters I and O are not to be used.

…

An N-Number may not begin with zero. You must precede the first zero in an N-Number with any number 1 through 9. For example, N01Z is not valid.

## Building the Regular Expression

Starting the expression is easy; we know that it must start with N.

```
^N
```

Following the N, though, we really have three possibilities, as defined in the FAA list:

- One to five digits
- One to four digits followed by one letter
- One to three digits followed by two letters

This means that after the N, there are three valid strings:

```
\d{1,5}
\d{1,4}[A-Z]
\d{1,3}[A-Z]{2}
```

However, we’re not done yet. We know that the first digit may not be zero, so we need to modify our expressions:

```
[1-9]\d{0,4}
[1-9]\d{0,3}[A-Z]
[1-9]\d{0,2}[A-Z]{2}
```

Note that because we specified the first digit, we had to decrease the counts of all the remaining digits by one. The FAA also indicated that where letters are used, “I” and “O” are not valid letters. So we need to modify our letter ranges as follows:

```
[1-9]\d{0,4}
[1-9]\d{0,3}[A-HJ-NP-Z]
[1-9]\d{0,2}[A-HJ-NP-Z]{2}
```

We’re looking good, but notice that all three of the possibilities start with [1-9]. Every valid US tail number thus starts with an N followed by a digit between 1 and 9, so we should include that 1-9 range up front with the N:

```
^N[1-9]
\d{0,4}
\d{0,3}[A-HJ-NP-Z]
\d{0,2}[A-HJ-NP-Z]{2}
```

Now we join the possibilities with or pipes (|) and parentheses, and cap it off with a dollar sign to indicate the end of the string:

```
^N[1-9]((\d{0,4})|(\d{0,3}[A-HJ-NP-Z])|(\d{0,2}[A-HJ-NP-Z]{2}))$
```

And finally, add the leading and trailing slashes to indicate a regular expression:

```
/^N[1-9]((\d{0,4})|(\d{0,3}[A-HJ-NP-Z])|(\d{0,2}[A-HJ-NP-Z]{2}))$/
```

We’re done! I used Rubular to test the regular expression with various valid and invalid US tail numbers, and it behaves as expected.

## Notes

This example is assuming the tail number string is using all uppercase letters. If you wish to consider tail numbers with lowercase letters valid, you’ll need to include the `/i`

case insensitive option at the end of the regular expression.