Uploading PDF files that require binary encoding to a web service (character sets)
The attachment of PDFs, JPGs and other file types through a web service call can be challenging, because character sets vary depending on the usage. Whitespace, non-printable ASCII data, line endings, tabs and other special characters can cause problems if the correct character set is not selected.
The binary encoding type is one solution to the issues noted above. Binary encoding is a bit-for-bit copy of the data that is useful for transferring non-text files like PDFs, JPGs and other file types. The character set to choose for binary encoding with eFORMz is ISO-8859-1.
Character sets
A character set is a table for mapping of the expected result to a position. For example ‘A’ is 64 in most tables and ‘B’ is 65. The first 128 positions are the same for most tables. EBCDIC is a notable and frequent exception. As most tables hold 256 positions (8-bit), the upper characters (128-255) will vary depending on use.
- eFORMz default character set is ROMAN8. This is the classic character set used by HP LaserJet printers.
- ISO-8859-1 is one for one and useful when handling binary data such as PDF files.
- PC-850 is used by Zebra printers and has a few odd characters in the lower half (0-127)
- UNICODE uses a 32-bit (4,294,967,295) position table.
- UTF-8 can represent almost as many characters as UNICODE but does do with 8-bit, 24-bit, and 32-bit depending on table sub-section.
- Windows is an 8-bit (256 position) table with character mapping to Microsoft standards.
Tags: Character set, EBCDIC, ISO-8859-1, PC-850, UNICODE, UTF-8