Completed
Last Updated: 31 Jan 2024 11:39 by ADMIN
Release 2024 Q1 (2024.1.130)
Martin
Created on: 09 Dec 2022 11:48
Category: UI for WinForms
Type: Feature Request
1
RadGridView: Improve performance of StringTokenizer

We work with a RadGridView with 145000 rows and 4 columns. We use copy-paste to move data around from other apps and the application with build with Telerik.

When we copy a flat file (with tabs delimited fields) and we paste it to the RadGridView, the whole process is painfully slow. The function to retrieve the data from the clipboard takes minutes (maybe hours, I cancelled it). It tracked the cause down to the StringTokenizer class. The tokenizer splits the string up into separate fields. But after extracting a field it creates a new copy of that string (containing about 10MB of data) minus the field. I patched it (with HarmonyX) and now it takes only one second:

static class StringTokenizerPerformancePatch { static private readonly InstanceFieldAccessor<StringTokenizer, LinkedList<string>> _tokens = new InstanceFieldAccessor< StringTokenizer, LinkedList<string>>("tokens"); static private readonly InstanceFieldAccessor<StringTokenizer, string> _sourceString = new InstanceFieldAccessor<StringTokenizer, string>("sourceString"); static private readonly InstanceFieldAccessor< StringTokenizer, string> _delimiter = new InstanceFieldAccessor<StringTokenizer, string>("delimiter"); static private readonly InstanceFieldAccessor< StringTokenizer, IEnumerator<string>> _enumerator = new InstanceFieldAccessor<StringTokenizer, IEnumerator<string>>("enumerator"); [HarmonyPatch(typeof(StringTokenizer), "Tokenize")] staticclassPatch_StringTokenizer_Tokenize { static bool Prefix(StringTokenizer __instance) { var tokens = _tokens.GetValue(__instance); var sourceString = _sourceString.GetValue(__instance); var delimiter = _delimiter.GetValue(__instance); Tokenize(tokens, sourceString, delimiter); _enumerator.SetValue(__instance, tokens.GetEnumerator()); returnfalse; } static private void Tokenize(LinkedList<string> tokens, string text, string delimiter) { tokens.Clear(); if (string.IsNullOrEmpty(text)) return; int index = 0; while(true) { var index2 = text.IndexOf(delimiter, index, StringComparison.Ordinal); if (index2 < 0) { tokens.AddLast(text.Substring(index)); break; } string token = text.Substring(index, index2 - index); tokens.AddLast(token); index = index2 + delimiter.Length; } } } }

Please update your tokanizer to increase performance. While you are at it:
  • If delimiters are one character, why not use String.Split?
  • Why use a LinkedList?
13 comments
ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 23 Jan 2023 10:08

Hello, Martin,

Thank you for sharing all the results. 

I understand your concerns and the feedback is much appreciated. However, please have in mind that generally speaking half a million of records is not purposed to be managed in RadGridView at all. That is why the suggested control was RadVirtualGrid which is the proper way for interacting with large amount of data since it doesn't load the entire data at once but only the visible part of it while scrolling.

The StringTokenizer is widely used in the paste operations for both controls. Our developers have invested some time to perform different tests and indeed a considerable improvement is observed. Thank you for the suggestion and the dedicated time. We will do our best to introduce the improvement in the upcoming releases. 

I have also updated your Telerik points for the efforts and product contribution. It is greatly appreciated.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Martin
Posted on: 17 Jan 2023 12:38

Hi Dess,

I am a bit confused. Lets me recap our conversion:

  1. Me: The Paste-method of the RadGridView is slow when pasting half a million records. This is due to the StringTokenizer.
  2. You: RadGridView is not really able to handle half a million records, try the RadVirtualGrid.

    (Me: Taking days to replace the RadGridView by the RadVirtualGrid. After implementing, pasting half a million records. Still slow.)

  3. Me: RadVirtualGrid is also slow, because it also uses StringTokenizer.
  4. You: Of course it is slow, you are pasting the same amount of records.
  5. Me: Eh... then why suggesting to use the RadVirtualGrid while pasting half a million records was the problem in the first place?

Please see these figures:

RecordCountUnpatchedPatchedGain
1000000:01.56100:00.002781
2000000:06.57500:00.0051315
3000000:15.05300:00.0053011
4000000:27.93100:00.0112539
5000000:47.77800:00.0261838
6000001:08.52600:00.0116230
7000001:35.70600:00.0185317
8000002:02.03200:00.0206102
9000002:37.53000:00.0413842
10000003:16.89700:00.0454375

 

What I do not understand it this:

  1. As per your own documentation RadGridView should support a million records. It doesn't.
  2. Than you give me the suggestion to transfer to RadVirtualGrid, because that would support a million records. It doesn't.
  3. You say "The StringTokenizer is globally used in the Telerik Presentation Framework and the TextPrimitive". So it is a core component, you how great would it be if that component is really fast?
  4. I gave you a working patch (see the numbers). You answer with "the RadVirtualGrid case will be considered for improvement when addressing this request",but... the problem is not with the RadVirtualGrid or the RadGridView, the problem is with the StringTokenizer, one of your core components.
  5. In short: I do not really understand the hesitation...
ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 16 Jan 2023 12:59

Hello, Martin,

RadVirtualGrid provides internal mechanism for loading data on demand, e.g. while scrolling. Hence, only the visible cell elements request values to be provided in the CellValueNeeded event. However, if you try to copy the whole content in RadVirtualGrid, it is expected to request the whole data and then it would be stored in the clipboard. In this case, there wouldn't be any significant difference because 145000 rows will be requested for values to be copied to clipboard.

The StringTokenizer is globally used in the Telerik Presentation Framework and the TextPrimitive. Hence, the RadVirtualGrid case will be considered for improvement when addressing this request.  Thank you for your understanding.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Martin
Posted on: 13 Jan 2023 15:39

Hi Dess,

Maybe in the case of RadVirtualGrid you might consider this a bug. In case of RadGridView it was NOT a bug, since RadGridView was not created for millions of records. RadGridView is and pasting a million records should not take minutes... right? :)

Regards,

Martin

ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 13 Jan 2023 14:46

Hi, Martin,

Thank you for your feedback. We will consider RadVirtualGrid as well when addressing this item.

Make sure that you cast your vote for the item in order to increase its priority.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.

Martin
Posted on: 13 Jan 2023 13:55

Hi Dess,

You might have noticed I am switching to RadVirtualGrid. I discovered that RadVirtualGrid is also using StringTokenizer. This means that the same performance problem will remain.

Regards,

Martin

ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 20 Dec 2022 14:18

Hi, Martin,

To be honest 1 million records is too much for RadGridView. Instead of using virtual mode of RadGridView, a recommended approach is to use RadVirtualGrid which is a grid component developed on top of Telerik Presentation Framework which provides a convenient way to implement your own data management operations and optimizes the performance when interacting with large amounts of data. Please refer to the following help article related to getting started with RadVirtualGrid:

https://docs.telerik.com/devtools/winforms/controls/virtualgrid/getting-started 

As to the question about the editor options when commenting in a feature request from the public feedback portal, these are the editor's options on my account as well: 

When submitting the request itself, it is expected to have more options for providing as much information as possible for describing the issue:

I believe this information helps.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Martin
Posted on: 17 Dec 2022 13:40

About performance in general: What may we (clients) expect? This article states "You want to bind RadGridView to over a million items. In such cases it is recommended to use Virtual Mode". But... we are not even half of that and already are some operations far too slow.

So can you express some minimum expectations? Something like: When you load a RadGridView with 1M records, all bulk operations (like loading, select-all, delete-all selected, copy, paste, etc.) should take a maximum of 10 seconds. (Maybe even  based on a certain processor type).

In that case we can report bugs instead of requests. In that case we also know when to switch to Virtual Mode or RadVirtualGrid. In that case I also expect Progress to test the performance before a new release. 

This feature requests is just one example of many about performance issues of RadGridView. Yesterday I also patched a piece of code to increase performance for a bulk append operation. Right now, for me, it is frustrating to do so, because I have the feeling I am fixing the problems of Progress. But if I am expecting more of the grid than what is defined, then it is my choice and my struggle. 

Martin
Posted on: 17 Dec 2022 13:12

When I create a new request/bug, then I see the bar as you have shown. 

When I reply (or add a new comment), I see this bar:

 

ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 16 Dec 2022 15:04

Hi, Martin,

When you post a reply from your Telerik account, you are expected to see the "Insert Code" options in the activated editor:

As to the reported feedback item, I have approved it so we can further investigate how to improve the performance for the StringTokenizer.

I have also updated your Telerik points.

Make sure that cast your vote for the item in order to increase its priority. You can subscribe for status changes and add your comments.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Martin
Posted on: 14 Dec 2022 13:17

Hi Dess,

I'll do better than that and focus on the component in question: The StringTokenizer. See the code below (off topic: When create a new request or bug, I can insert code. When I am answering on an existing request/bug, that button is gone... why?)

In this example I create "clipboarddata" with 50,000 records and 4 columns. The total length of the string is 1.2MB. Constructing this tokenizer takes 38 seconds (on my machine). This delay grows exponentially to the size of the data! Using my patch it takes only 0.03 seconds and grows linear. You can imagine how slow it performs having clipboard data of 10MB.

This tokenizer is used during pasting plain text and pasting HTML (and maybe in other places). 

==================================================================

static void MeasurePerformance()
{
var sb = new StringBuilder();
for (var i = 0; i < 50000; i++)
sb.Append($"{i}\t{i + 1}\t{i + 2}\t{i + 3}\r\n");

var clipboardData = sb.ToString();

var stopwatch = new Stopwatch();
stopwatch.Start();
var tokenizer = new StringTokenizer(clipboardData, "\n");
stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed.TotalSeconds);
}

 

 

ADMIN
Dess | Tech Support Engineer, Principal
Posted on: 13 Dec 2022 15:02

Hello, Martin,

I am really sorry to hear that you are experiencing any difficulties with our RadGridView control.  However, in order to confirm if any problem exists I would kindly ask you to provide a sample flat file (with tabs delimited fields) which you are using for the copy/paste functionality. Then, we would be able to open the file, select the whole content and copy it. Then, paste it in RadGridView and make an adequate analysis of the precise case. If any specific setup is available for the grid as well, feel free to provide a sample runnable project demonstrating the problem you are facing. Thus, we would be able to make an adequate analysis of the precise case and provide further assistance.

Thank you in advance. I am looking forward to your reply.

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Martin
Posted on: 09 Dec 2022 12:26

Addition:

As suggested in my first post, StringTokenizer can make use of String.Split if the delimiter is only one character. But... In the methods MasterGridViewTemplate.GetHtmlData and MasterGridViewTemplate.GetTextData why not use String.Split and remove the use of the StringTokenizer?