I am using the trial version of telerik for xamarin for .net core which was released last year, to convert pdf to text. Our service is hosted in Azure. The text which I get back, in certain cases the spaces are missing (say for example instead of 'I [am] here', it displays 'I[am]here'. This happens randomly.
The code which we have used is as follows -byte pdfBinary = Convert.FromBase64String(inputString);
When merging files that contain the "159 '\u009f'" char, ArgumentException("The encoding is not supported.") is thrown.
When a TrueType font is defined, the mapping of character codes to glyph indices depends on the built-in cmap table mappings defined in the font and the Encoding property defined in the PDF dictionary.
However, the current implementation maps all characters with cmap tables for Microsoft Symbolic and Macintosh Roman, which causes incorrect mapping results, e.g. space characters are mapped to an Ê glyph.
The issue is also described in the following public item: TryGetCharCode for OpenTypeFont uses wrong cmap and returns wrong charcode.
Workaround: Change the font of the TextBoxField's widget appearance:
foreach (var widget in field.Widgets)
widget.TextProperties.Font = FontsRepository.Helvetica;
When importing document with predefined ToUnicode CMaps (e.g. Identity-H), an InvalidCastException is thrown with cause: Unable to cast object of type 'Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.Model.Types.PdfName' to type 'Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.Model.Elements.CMaps.ToUnicodeCMap'.