According to the PDF Specification: A redaction annotation (PDF 1.7) identifies content that is intended to be removed from the document.
There are two possible options to workaround this functionality:
foreach (RadFixedPage page in document.Pages)
{
foreach (ContentElementBase elementBase in page.Content)
{
if (elementBase is TextFragment textFragment)
{
if (IsValidEmail(textFragment.Text))
{
textFragment.Text = string.Empty;
}
}
}
}
foreach (RadFixedPage page in document.Pages)
{
foreach (ContentElementBase elementBase in page.Content)
{
if (elementBase is TextFragment textFragment)
{
if (IsIntersecting(rectangleGeometry, textFragment))
{
textFragment.Text = string.Empty;
}
}
}
}
When importing a document containing a CIDFont with default width (DW) set as PdfReal (double) an exception is thrown: System.InvalidCastException: 'Unable to cast object of type 'Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.Model.Types.PdfReal' to type 'Telerik.Windows.Documents.Fixed.FormatProviders.Pdf.Model.Types.PdfInt'.'
44 0 obj
<< /BaseFont /NotoSansMono-Medium /CIDSystemInfo << ... >> /CIDToGIDMap /Identity /DW 600.00000 ... /Subtype /CIDFontType2 /Type /Font >>
endobj
According to the PDF Specification: DW - integer - (Optional) The default width for glyphs in the CIDFont. Default value: 1000.
According to the Pdf Specification: A given object number must not have an entry in more than one subsection within a single section.
The object on line 7 has object number 2, the same as of object on line 5.
When exporting a specific document with a CIDFontType0 font (Korean TAWBUL+HGGGothicssiP80g or BPRSCV+HGGGothicssiP60g) the document is wrongly exported which leads to missing content.
Workaround:
After import you can change the font:
pdfDocument = provider.Import(memory);
FontBase malgunGothicFont;
FontsRepository.TryCreateFont(new FontFamily("Malgun Gothic"), out malgunGothicFont);
foreach (RadFixedPage page in pdfDocument.Pages)
{
foreach (ContentElementBase element in page.Content)
{
TextFragment textFragment = element as TextFragment;
if (textFragment != null && (textFragment.Font.Name == "TAWBUL+HGGGothicssiP80g" || textFragment.Font.Name == "BPRSCV+HGGGothicssiP60g"))
{
textFragment.Font = malgunGothicFont;
}
}
}
When exporting PDF documents containing images different than Jpeg and Jpeg2000 the PdfProcessing is using by default the ImageSharp library in order to convert these images to Jpeg.
It seems there is an issue in the older version of the ImageSharp library: Saving a PNG as Jpeg only processes a part of the image on .NET 6.
Workaround: This issue seems to be fixed in the current version (2.0.0) of the ImageSharp library.
Currently, the text extraction is following the behavior (text distance) as exported with Adobe.
Provide a setting in TextFormatProvider in order to keep the original distance as in the PDF document.