When the document contains Simple Font with predefined encoding and no ToUnicode mapping the text should be extracted with the following algorithm:
Currently, the PdfProcessing library doesn't map the character code properly which leads to wrongly encoded text content.
When importing document with predefined ToUnicode CMaps (e.g. Identity-H), an InvalidCastException is thrown with cause: Unable to cast object of type 'Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.Model.Types.PdfName' to type 'Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.Model.Elements.CMaps.ToUnicodeCMap'.
When merging files that contain the "159 '\u009f'" char, ArgumentException("The encoding is not supported.") is thrown.
Trying to clone the Signature of a SignatureField leads to InvalidOperationException as the FieldName of the cloned signature is already set.
When a TrueType font is defined, the mapping of character codes to glyph indices depends on the built-in cmap table mappings defined in the font and the Encoding property defined in the PDF dictionary.
However, the current implementation maps all characters with cmap tables for Microsoft Symbolic and Macintosh Roman, which causes incorrect mapping results, e.g. space characters are mapped to an Ê glyph.
The issue is also described in the following public item: TryGetCharCode for OpenTypeFont uses wrong cmap and returns wrong charcode.
Workaround: Change the font of the TextBoxField's widget appearance:
foreach (var widget in field.Widgets)
widget.TextProperties.Font = FontsRepository.Helvetica;