The Canadian Payments Association has published the standards for the CPA 005 1464-byte file format, or "CPA 005" for short.
The latest official documentation can be found here. The Royal Bank of Canada (RBC) also lists out the various formats they use, specifically the CPA005 Debit and CPA005 Credit file layouts. Other banks and software vendors also briefly mention the CPA 005 format. That's as far as documentation goes.
Unfortunately, all these documents are riddled with omissions, inconsistencies and blatant errors. Although they are useful to decipher the information a CPA005 (or PAD) file contains, combining all the documentation is still not sufficient to generate a PAD file from scratch. As far as programming is concerned, none of the existing CPA005 documentation is complete.
We took the painstaking process of cross-referencing the specifications from all available sources and reverse engineering known working files, and finally came up with the resource you are reading now. We hope that this article removes the uncertainties and unnecessary guesswork for anyone who wishes to implement the CPA 005 standard.
The Purpose of CPA005
Historically, CPA 005 was intended for batched, inter-bank transactions. Thousands of transactions were recorded on tape. Enough time, usually in a matter of days, was set aside for the transactions to settle. Most of the settlement time was spent on the physical transportation of the tape. This was the period when one bank had to fully trust the other bank. Not any bank branch could process the tape. Regional "processing centres" were designated to receive the tapes. For example RBC has processing centres in Vancouver, Calgary, Regina, Winnipeg, Toronto, Montreal and Halifax, according to documentation. Nowadays, the PAD files are still used to batch pre-authorized transactions. That void cheque you provide to your utility company or loan provider contains the very information that's needed to encode a PAD file.
The Execution Context, or the "Interface"
One thing that all the documentation is missing, is that how is a CPA 005 file, "run"? If CPA 005 is anything like a modern web standard such as JSON, XML, shouldn't it also define its interface? Is it REST or SOAP? How is the transmission authenticated? OAuth or other security tokens? As for participating banks who receive PAD files, shouldn't they provide endpoints (web locations) of their services?
It turns out that CPA 005 has none of that. It is a physical file that is generated, can be edited at will, and then upload via an authenticated online banking interface. Essentially, the submitter of a PAD file, or the "originator", can write themselves cheques of arbitrary amounts. Of course they will never charge their clients an amount that's not agreed upon. However this is enforcement by regulation, not by standards. So much for security.
Now we are ready to generate a file. A PAD file is just a text file with padded fields. The reason it's called a 1464-byte format is that each line contains 1464 characters. The first letter of each line indicates the type of transaction.
Here's a sample PAD file. You may save and open it in a text editor. Set the font to be a mono-space font. For example "console" or "courier". Turn off line wrap. Note that not every text editor is capable of displaying long lines.
Here's a diagram of the CPA 005 layout:
A PAD file has a header and a trailer, denoted by A and Z respectively. In our example, the D line contains a single debit transaction, i.e. taking funds from the client's bank account.
A D line has a 24-character header (not to be confused with the file header, or A record). After that, a transaction is described in 240 characters. Each transaction in a D line is called a "segment". A D line may contain up to 6 segments. If we do the math, 24 + 6×240 = 1464 characters.
If a D line has fewer than 6 segments, it is padded to 1464 characters.
A C line has the same layout. In theory, C lines can be mixed with D lines in the same file. In practice, the debits and deposits are batched separately.
Both C and D lines are called "detailed records". Detailed records repeat some information that's already in the header. This is to make sure that the transaction from one file doesn't get mixed up with another - they only way this can possibly happen is that the tape drive spins too fast and the records are read in out of order. Remember that the banks are still using mainframe computers - to this day!
Clarification of Conventions
Certain text formatting conventions are not at all defined in the official documentation. We tried and erred and came up with one set of interpretation that seems to satisfy the PAD uploader.
Line breaks: "each line in a PAD file must contain 1464 characters". This definition says nothing about how the lines are separated. To complete the documentation, let's look at the common line breaking options:
|LF||\n||*nix, BSD, OSX|
The processor we tested seemed relatively tolerant. Nearly anything goes, including some ill-formed line breaks (\r\r\n) that makes each "line" 1467 characters. Our implementation uses CR+LF (\r\n); this means each line has 1466 characters.
Padding: when a file creation number (FCN) has 2 digits, say, "13", and the FCN field has an allocation of 4, the field is padded to "0013". This is a "zero-left-padding". The documentation assumes that we know the convention of always zero-pad to the left, and space-pad to the right. The documentation also makes no effort to point out when a field should be 0-padded or space-padded. Our rule of thumb: zero-left-pad numeric fields, space-right-pad text or descriptive fields.
Data Processing Centre: this is a 5-digit number that is very hard to find. It is not any number that you are ever exposed to in your every day banking. It is certainly not a "routing number" (see later), or a "branch number".
Here are some of the data centre numbers we've gathered:
|Bank||Bank Code||Region||Data Centre|
We'll add more entries as they are discovered.
Originator: An Originator ID is yet another nowhere to be found number. It is issued when the originator is granted the right to submit PAD/EFT payments.
Long/Short Names: Many names, such as the name of the originator, or the name of a deposit recipient, are stored in both long and short versions. A long version has a 30-character limit while the short name has a limit of 15 characters. The 15-character version is used for printing bank statements.
Settlement Account / Return Path: these are the same concept but expressed in different locations of the file in different formats. When the originator's account locator information is missing, different error messages are given when the file is rejected. In the header (see "Inconsistencies"), the account is called a Settlement Account. In detailed records, the account is a Return Path.
Bank, Transit, Account: these three elements are key to identify one's bank account. Much like a postal address where the city, street and house number progressively zoom into the final destination, a bank, transit (within a bank) and account (within the transit) form the same progressive locator. The following diagram shows their locations on a cheque:
However, if you find yourself constantly relying on the above picture, there are reasons why you can't easily memorize the locations. For one, the institution, or bank, is in the middle - it is as awkward as the date format September 2018, 25th. Also on many forms, the institution/bank number is written as 4 digits. The cheque shows only 3. For example, CIBC 0010 is just 010 on a cheque.
Routing Number: it is not a separate number. in the context of a void cheque, a "routing number" is simply the combination of the 4-digit bank code, plus the 7-digit account number. Some say that a routing number is a 3-digit bank code and a 7-digit account number, with a leading zero. This is incorrect even though both interpretations end with the same result. The leading zero is not inherent to the routing number. It's not like dialing a long distance phone number where adding a zero makes a number... "routable". The zero belongs to the bank code.
Date Format: A date is expressed as a 6-digit number. Three for the year, and three for the day in year. For example, 2018-09-01 is 018 243. January 1st is the 0th day. Calculating by hand, the number of days between January 1st and September 1st in 2018, a non-leap year, would be: 31+28+31+30+31+30+31+31=243.
File Creation Number (FCN): This is simply a sequence number that rotates from 0001 to 9999. Each submission cannot repeat the previous 10 numbers. This is how duplicate entry is prevented.
Transaction Code: This code has nothing to do with your transaction itself. It should be better named as Transaction Category. CPA has defined the unrestricted codes. The restricted codes can only be used by the Federal and Provincial government.
Some common codes are listed here. A code can be more specific. For example, 370 is mortgage, but 371 is specifically residential mortgage.
Inconsistencies in Official Docs
If you program against the official documentation, your PAD file won't be accepted. There are three (3) crucial inaccuracies, 2 in the RBC documentation, the other in both RBC and CPA documentation.
At the time of writing this article, the RBC Debit File Format Specifications was last updated in 2013. The CPA's PAD Standard 005 was last updated in 2016.
In the RBC documentation, the Basic Payment Record section, Page 6:
In the CPA documentation, Page 14:
|16||194-202||9||Bank ID for Returns||Numeric|
|17||203-214||12||Account No. for Returns||Alphanumeric|
Every transaction should have a return path in case the debit or credit fails. The originator's account locator, in the form of a bank (4), transit (5) and account (12) combo. If we omit this information as per RBC documentation, the PAD file will be rejected.
In a similar fashion, the RBC documents states that there is nothing to worry about at position 252-253:
Cross referencing the CPA documentation with some correction, we came up with this:
Between the end of the header (position 24) and the position 264, there are exactly 240 characters. This means each segment ends with the sequence of:
The next inaccuracy is in the header layout. Both documentations say there is nothing significant in this area of the header:
|09||59-1464||1406||Filler, Leave Blank||Alphanumeric|
This is absolutely wrong and mind boggling. As shown in the file layout diagram, there is a little pink area - that's where the settlement account information is provided - all the way to almost the end of the line. This nugget of information is nearly invisible to any text editor unless you actively look for it by horizontally scrolling through the file.
The following chart is through our reverse engineering and rigorous testing:
Note that the Account number here is only 7 digits. For some reason, the account number is considered to be "alphanumeric". Maybe it is a 12-digit account with white space padding to the right? We can't tell. At least this sequence of digits works.
After filling the blanks of some missing definitions, patching some holes of one document from another, and some probing, we can finally generate working PAD files from scratch.