Unplanned
Last Updated: 23 Jun 2025 13:16 by ADMIN
Kris
Created on: 20 Jun 2025 12:56
Category: WordsProcessing
Type: Bug Report
1
WordsProcessing: HtmlFormatProvider: Minified HTML Ignores Margins

When converting HTML to DOCX, margins set on an HTML element are ignored. These styles are exported correctly when the HTML passed to the converter is formatted with indents. The following XUnit test demonstrates this behavior with a simplified example.

using Telerik.Windows.Documents.Flow.FormatProviders.Docx;
using Telerik.Windows.Documents.Flow.FormatProviders.Html;

namespace MSPI.Tests.Unit;

public class WordExportTest
{
    [Fact]
    public async Task TextExport()
    {
        const string formattedDocumentSavePath = @"C:\Testing\export-test-formatted.docx";
        const string formattedContent = """"
            <p>Test paragraph</p>
            <ol style="margin-left: 100px;">
                <li>Item 1</li>
                <li>Item 2</li>
            </ol>
        """";

        const string minifiedDocumentSavePath = @"C:\Testing\export-test-minified.docx";
        const string minifiedContent = """"<p>Test paragraph</p><ol style="margin-left: 100px;"><li>Item 1</li><li>Item 2</li></ol>"""";

        var htmlFormatProvider = new HtmlFormatProvider();
        var docxFormatProvider = new DocxFormatProvider();

        await using var minifiedDocumentMemoryStream = new MemoryStream();
        var minifiedRadFlowDocument = htmlFormatProvider.Import(minifiedContent, TimeSpan.FromSeconds(30));
        docxFormatProvider.Export(minifiedRadFlowDocument, minifiedDocumentMemoryStream, TimeSpan.FromSeconds(30));
        var minifiedBytes = minifiedDocumentMemoryStream.ToArray();
        await File.WriteAllBytesAsync(minifiedDocumentSavePath, minifiedBytes);

        await using var formattedDocumentMemoryStream = new MemoryStream();
        var formattedRadFlowDocument = htmlFormatProvider.Import(formattedContent, TimeSpan.FromSeconds(30));
        docxFormatProvider.Export(formattedRadFlowDocument, formattedDocumentMemoryStream, TimeSpan.FromSeconds(30));
        var formattedBytes = formattedDocumentMemoryStream.ToArray();
        await File.WriteAllBytesAsync(formattedDocumentSavePath, formattedBytes);
    }
}

The minified HTML produces the following document:

The formatted HTML produces the following document:

 

1 comment
ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 23 Jun 2025 13:16

Hello, Kris,

Following the provided information, I have prepared a sample project to test the behavior on my end with the latest version of RadWordsProcessing. Indeed, there is a difference in the indent for the list items when the HTML content is formatted with new lines:

I confirms that the behavior you are observing—where margin styles (e.g., margin-left: 100px; on an <ol> element) are ignored when importing minified HTML, but work correctly with formatted (indented) HTML—is an issue in the HTML to DOCX conversion process. The underlying cause is related to how the HTML parser handles whitespaces and line breaks. In certain cases, minified HTML can lead to incorrect parsing of inline styles, resulting in styles like margins being skipped during import.

I have approved this item. You can cast your vote for the item, track its progress, subscribe for status changes, and add your comments on the following link:

WordsProcessing: HtmlFormatProvider: Minified HTML Ignores Margins

I have also updated your Telerik points.

We will do our best to introduce a proper fix accordingly. Please accept our sincere apologies for the inconvenience caused.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.