Completed
Last Updated: 30 Mar 2020 06:57 by ADMIN
Release R2 2020
Martin Robins
Created on: 11 Mar 2020 07:36
Category: WordsProcessing
Type: Feature Request
1
WordsProcessing: TxtFormatProvider: Add support for exporting Line breaks <br> to plain text format
Currently, the Line breaks <br> are not exported to plain text format.

Workaround:
Replace <br> tags in the HTML document with a marker
string html = File.ReadAllText("Source.html");
string newHtml = html.Replace("<br>", "[br]");
File.WriteAllText("NewSource.html", newHtml);
Then import the edited HTML and export it as plain text, then replace the markers with "\r\n"
using (Stream stream = File.OpenRead("NewSource.html"))
{
	HtmlFormatProvider htmlFormatProvider = new HtmlFormatProvider();
	flowDocument = htmlFormatProvider.Import(stream);
	
	TxtFormatProvider txtFormatProvider = new TxtFormatProvider();
	string text = txtFormatProvider.Export(flowDocument);
	string newText = text.Replace("[br]", "\r\n");
}

8 comments
ADMIN
Peshito
Posted on: 30 Mar 2020 06:57

Hello,

This item will be available in R2 2020 Release. 

It is also available with Telerik UI for WPF latest internal build - LIB 2020.1.330 (03/30/2020) if you need it earlier.

Regards,
Peshito
Progress Telerik

Progress is here for your business, like always. Read more about the measures we are taking to ensure business continuity and help fight the COVID-19 pandemic.
Our thoughts here at Progress are with those affected by the outbreak.
ADMIN
Martin
Posted on: 19 Mar 2020 14:35

Hello Martin,

Take your time and when you are ready, please, send us the mentioned example. You can make this in a new ticket thread as well.

Regards,
Martin
Progress Telerik

Get quickly onboarded and successful with your Telerik and/or Kendo UI products with the Virtual Classroom free technical training, available to all active customers. Learn More.
Martin Robins
Posted on: 13 Mar 2020 11:22
Bear with me; I will provide a full example at the weekend for you.
ADMIN
Martin
Posted on: 11 Mar 2020 14:18

Hello Martin,

This works on my side. In the provided sample HTML document there are two paragraphs that are exported in two different lines. The second paragraph contains only a non-breaking space (&nbsp;) and it is not visible but there is a paragraph exported. If you insert some text in the second paragraph it will become visible:

<p class=MsoNormal>
	<o:p>&nbsp; text</o:p>
</p>

I am attaching two screenshots showing the result of the export with a non-breaking space (Screenshot_withNonBreakingSpace.jpg) and with text inserted (Screenshot_withText.jpg).

If the result is not the same in your end or if I am missing something, could I ask you to send us a sample project demonstrating this behavior? You can do this in the mentioned support ticket.

Regards,
Martin
Progress Telerik

Get quickly onboarded and successful with your Telerik and/or Kendo UI products with the Virtual Classroom free technical training, available to all active customers. Learn More.
Martin Robins
Posted on: 11 Mar 2020 13:16

I will try again...

I am surprised you are asking this given that you raised the issue on my behalf following my support ticket.

Given the following code...

using (Stream stream = File.OpenRead("Source.html"))
{

    HtmlFormatProvider htmlFormatProvider = new HtmlFormatProvider();
    RadFlowDocument flowDocument = htmlFormatProvider.Import(stream);

    TxtFormatProvider txtFormatProvider = new TxtFormatProvider();
    String text = txtFormatProvider.Export(flowDocument);

}

and the attached "Source.html" (inside the zip archive)...

The resulting "text" contains no newline characters but instead contains all of the text in a single continuous line.

This is what I am suggesting should be addressed.

Martin.

 

Attached Files:
Martin Robins
Posted on: 11 Mar 2020 13:14
Sorry, I seem to have broken the portal by trying to post HTML!
ADMIN
Martin
Posted on: 11 Mar 2020 11:52

Hi Martin,

I am not sure what is the specific case. Could you give us more information on the scenario?

However, I tried to export to plain text a sample HTML document containing three paragraphs:

<!DOCTYPE html>
<html>
<body>
    <p>Paragraph 1.</p>
    <p>Paragraph 2.</p>
    <p>Paragraph 3.</p>
</body>
</html>
and every paragraph is exported on a separate newline as expected: 

Correct me if I am missing something. Looking forward to hearing from you.

Regards,
Martin
Progress Telerik

Get quickly onboarded and successful with your Telerik and/or Kendo UI products with the Virtual Classroom free technical training, available to all active customers. Learn More.
Martin Robins
Posted on: 11 Mar 2020 10:41
The same should apply for paragraph tags (<p></p>); presumably adding a line break wherever there was previously an end paragraph (</p>)?