Creating a Regular Expression for US Tail Numbers

(updated )
A table on Flight Historian of Paul's tail numbers, showing a country flag, aircraft type, airline, and number of flights for each tail.

One of the minor features I’ve added to Flight Historian is country flags for tail numbers. Every aircraft is registered to one country, and each country has its own assigned format for tail numbers, so it’s possible to look at each tail number and determine what country it’s from.

Since this operation is matching a string to a pattern, it made sense to create regular expressions for each country. For most countries, whose tail number is a unique prefix followed by a dash and three or four letters, this was easy to do. But the United States rules for valid tail numbers are substantially more complicated.

Valid US Tail Numbers

US tail number validity is defined by the Federal Aviation Administration (FAA):

N-Numbers consist of a series of alphanumeric characters. U.S. registration numbers may not exceed five characters in addition to the standard U.S. registration prefix letter N. These characters may be:

To avoid confusion with the numbers one and zero, the letters I and O are not to be used.

An N-Number may not begin with zero. You must precede the first zero in an N-Number with any number 1 through 9. For example, N01Z is not valid.

Building the Regular Expression

Starting the expression is easy; we know that it must start with N.

^N

Following the N, though, we really have three possibilities, as defined in the FAA list:

This means that after the N, there are three valid strings:

\d{1,5}
\d{1,4}[A-Z]
\d{1,3}[A-Z]{2}

However, we’re not done yet. We know that the first digit may not be zero, so we need to modify our expressions:

[1-9]\d{0,4}
[1-9]\d{0,3}[A-Z]
[1-9]\d{0,2}[A-Z]{2}

Note that because we specified the first digit, we had to decrease the counts of all the remaining digits by one. The FAA also indicated that where letters are used, “I” and “O” are not valid letters. So we need to modify our letter ranges as follows:

[1-9]\d{0,4}
[1-9]\d{0,3}[A-HJ-NP-Z]
[1-9]\d{0,2}[A-HJ-NP-Z]{2}

We’re looking good, but notice that all three of the possibilities start with [1-9]. Every valid US tail number thus starts with an N followed by a digit between 1 and 9, so we should include that 1-9 range up front with the N:

^N[1-9]
\d{0,4}
\d{0,3}[A-HJ-NP-Z]
\d{0,2}[A-HJ-NP-Z]{2}

Now we join the possibilities with or pipes (|) and parentheses, and cap it off with a dollar sign to indicate the end of the string:

^N[1-9]((\d{0,4})|(\d{0,3}[A-HJ-NP-Z])|(\d{0,2}[A-HJ-NP-Z]{2}))$

And finally, add the leading and trailing slashes to indicate a regular expression:

/^N[1-9]((\d{0,4})|(\d{0,3}[A-HJ-NP-Z])|(\d{0,2}[A-HJ-NP-Z]{2}))$/

We’re done! I used Rubular to test the regular expression with various valid and invalid US tail numbers, and it behaves as expected.

Notes

This example is assuming the tail number string is using all uppercase letters. If you wish to consider tail numbers with lowercase letters valid, you’ll need to include the /i case insensitive option at the end of the regular expression.

Tags: