Skip to content

Commit 2648ce6

Browse files
author
Ron Petrusha
authored
Merge pull request dotnet#518 from rpetrusha/icu-libraries
Noted use of ICL libraries for categorizing/comparing characters
2 parents 07f6a84 + c525881 commit 2648ce6

File tree

10 files changed

+89
-70
lines changed

10 files changed

+89
-70
lines changed

includes/unicode-categories.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.NET maintains its own table of characters and their corresponding categories, which ensures that a specific version of a .NET implementation running on different platforms returns identical character category information. On .NET Core running on Linux and macOS, character category information is provided by [International Components for Unicode](http://site.icu-project.org/) libraries.
2+
3+
The following table lists .NET versions and the versions of the Unicode Standard on which their character categories are based.
4+
5+
|.NET version|Version of the Unicode Standard|
6+
|----------------------------|-------------------------------------|
7+
|.NET Framework 1.1|[The Unicode Standard, Version 4.0.0](https://www.unicode.org/versions/Unicode4.0.0/)|
8+
|.NET Framework 2.0|[The Unicode Standard, Version 5.0.0](https://www.unicode.org/versions/Unicode5.0.0)|
9+
|.NET Framework 3.5|[The Unicode Standard, Version 5.0.0](https://www.unicode.org/versions/Unicode5.0.0)|
10+
|.NET Framework 4|[The Unicode Standard, Version 5.0.0](https://www.unicode.org/versions/Unicode5.0.0)|
11+
|.NET Framework 4.5|[The Unicode Standard, Version 6.3.0](https://www.unicode.org/versions/Unicode6.3.0/)|
12+
|.NET Framework 4.51|[The Unicode Standard, Version 6.3.0](https://www.unicode.org/versions/Unicode6.3.0/)|
13+
|.NET Framework 4.52|[The Unicode Standard, Version 6.3.0](https://www.unicode.org/versions/Unicode6.3.0/)|
14+
|.NET Framework 4.6|[The Unicode Standard, Version 6.3.0](https://www.unicode.org/versions/Unicode6.3.0/)|
15+
|.NET Framework 4.61|[The Unicode Standard, Version 6.3.0](https://www.unicode.org/versions/Unicode6.3.0/)|
16+
|.NET Framework 4.6.2 and later versions|[The Unicode Standard, Version 8.0.0](https://www.unicode.org/versions/Unicode8.0.0/)|
17+
|.NET Core (all versions)|[The Unicode Standard, Version 8.0.0](https://www.unicode.org/versions/Unicode8.0.0/)|
18+

xml/System.Globalization/CompareInfo.xml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,8 @@
8282
8383
]]></format>
8484
</remarks>
85-
<related type="ExternalDocumentation" href="https://www.microsoft.com/en-us/download/details.aspx?id=10921">Sorting Weight Tables for Windows operating systems</related>
85+
<related type="ExternalDocumentation" href="https://www.microsoft.com/en-us/download/details.aspx?id=10921">Sorting Weight Tables for Windows operating systems</related>
86+
<related type="ExternalDocumentation" href="https://www.unicode.org/Public/UCA/latest/allkeys.txt">Default Unicode Collation Elemeent Table, for Linux and macOS</related>
8687
</Docs>
8788
<Members>
8889
<MemberGroup MemberName="Compare">

xml/System.Globalization/CompareOptions.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
## Remarks
4949
These options denote case sensitivity or necessity to ignore types of characters.
5050
51-
.NET uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string. For a downloadable set of text files that contain information on the character weights used in sorting and comparison operations for Windows operating systems, see [Sorting Weight Tables](https://www.microsoft.com/en-us/download/details.aspx?id=10921).
51+
.NET uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string. For a downloadable set of text files that contain information on the character weights used in sorting and comparison operations for Windows operating systems, see [Sorting Weight Tables](https://www.microsoft.com/en-us/download/details.aspx?id=10921). For the sort weight table for Linux and macOS, see the [Default Unicode Collation Element Table](https://www.unicode.org/Public/UCA/latest/allkeys.txt). The specific version of the sort weight table on Linux and macOS depends on the version of the [International Components for Unicode](http://site.icu-project.org/) libraries installed on the system. For information on ICU versions and the Unicode versions that they implement, see [Downloading ICU](http://site.icu-project.org/download).
5252
5353
5454
The `StringSort` value can only be used with <xref:System.Globalization.CompareInfo.Compare%2A?displayProperty=nameWithType> and <xref:System.Globalization.CompareInfo.GetSortKey%2A?displayProperty=nameWithType>. <xref:System.ArgumentException> is thrown if the StringSort value is used with <xref:System.Globalization.CompareInfo.IsPrefix%2A?displayProperty=nameWithType>, <xref:System.Globalization.CompareInfo.IsSuffix%2A?displayProperty=nameWithType>, <xref:System.Globalization.CompareInfo.IndexOf%2A?displayProperty=nameWithType>, or <xref:System.Globalization.CompareInfo.LastIndexOf%2A?displayProperty=nameWithType>.

xml/System.Globalization/SortKey.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
After you create a sort key for a string, you compare sort keys by calling the static <xref:System.Globalization.SortKey.Compare%2A?displayProperty=nameWithType> method. This method performs a simple byte-by-byte comparison, so it is much faster than the <xref:System.String.Compare%2A?displayProperty=nameWithType> or <xref:System.Globalization.CompareInfo.Compare%2A?displayProperty=nameWithType> method.
4747
4848
> [!NOTE]
49-
> You can download the [Sorting Weight Tables](https://www.microsoft.com/en-us/download/details.aspx?id=10921), a set of text files that contain information on the character weights used in sorting and comparison operations for Windows operating systems.
49+
> You can download the [Sorting Weight Tables](https://www.microsoft.com/en-us/download/details.aspx?id=10921), a set of text files that contain information on the character weights used in sorting and comparison operations for Windows operating systems, the the [Default Unicode Collation Element Table](https://www.unicode.org/Public/UCA/latest/allkeys.txt), the sort weight table for Linux and macOS.
5050
5151
## Performance considerations
5252
When performing a string comparison, the <xref:System.Globalization.SortKey.Compare%2A> and <xref:System.Globalization.CompareInfo.Compare%2A?displayProperty=nameWithType> methods yield the same results, but they target different scenarios.

xml/System.Globalization/SortVersion.xml

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,10 @@
3737
<remarks>
3838
<format type="text/markdown"><![CDATA[
3939
40-
## Remarks
40+
## Remarks
41+
42+
### Sorting and string comparison in the .NET Framework
43+
4144
From the [!INCLUDE[net_v20sp1_long](~/includes/net-v20sp1-long-md.md)] through the [!INCLUDE[net_v40_short](~/includes/net-v40-short-md.md)], each version of the.NET Framework has included tables that contain sort weights and data on string normalization and that are based on a particular version of Unicode. In the [!INCLUDE[net_v45](~/includes/net-v45-md.md)], the presence of these tables depends on the operating system:
4245
4346
- On [!INCLUDE[win7](~/includes/win7-md.md)] and previous versions of the Windows operating system, the tables continue to be used for comparing and ordering strings.
@@ -53,14 +56,20 @@
5356
|[!INCLUDE[net_v45](~/includes/net-v45-md.md)] and later versions of the .NET Framework|[!INCLUDE[win8](~/includes/win8-md.md)] and later Windows operating system versions|Unicode 6.0|
5457
5558
On [!INCLUDE[win8](~/includes/win8-md.md)], because the version of Unicode used in string comparison and ordering depends on the version of the operating system, the results of string comparison may differ even for applications that run on a specific version of the .NET Framework.
56-
57-
The <xref:System.Globalization.SortVersion> class provides information about the Unicode version used by the .NET Framework for string comparison and ordering. It enables developers to write applications that can detect and successfully handle changes in the version of Unicode that is used to compare and sort an application's strings.
59+
60+
### Sorting and string comparison in .NET Core
61+
62+
All versions of .NET Core rely on the underlying operating system when performing string comparisons. Therefore, the results of a string comparison or the order in which strings are sorted depends on the version of Unicode used by the operating system when performing the comparison. On Linux and macOS, [International Components for Unicode](http://site.icu-project.org/) libraries provide the implementation for comparison and sorting APIs.
63+
64+
### Using the SortVersion class
65+
66+
The <xref:System.Globalization.SortVersion> class provides information about the Unicode version used by .NET for string comparison and ordering. It enables developers to write applications that can detect and successfully handle changes in the version of Unicode that is used to compare and sort an application's strings.
5867
5968
You can instantiate a <xref:System.Globalization.SortVersion> object in two ways:
6069
6170
- By calling the <xref:System.Globalization.SortVersion.%23ctor%2A> constructor, which instantiates a new <xref:System.Globalization.SortVersion> object based on a version number and sort ID. This constructor is most useful when recreating a <xref:System.Globalization.SortVersion> object from saved data.
6271
63-
- By retrieving the value of the <xref:System.Globalization.CompareInfo.Version%2A?displayProperty=nameWithType> property. This property provides information about the Unicode version used by the .NET Framework on which the application is running.
72+
- By retrieving the value of the <xref:System.Globalization.CompareInfo.Version%2A?displayProperty=nameWithType> property. This property provides information about the Unicode version used by the .NET implementation on which the application is running.
6473
6574
The <xref:System.Globalization.SortVersion> class has two properties, <xref:System.Globalization.SortVersion.FullVersion%2A> and <xref:System.Globalization.SortVersion.SortId%2A>, that indicate the Unicode version and the specific culture used for string comparison. The <xref:System.Globalization.SortVersion.FullVersion%2A> property is an arbitrary numeric value that reflects the Unicode version used for string comparison, and the <xref:System.Globalization.SortVersion.SortId%2A> property is an arbitrary <xref:System.Guid> that reflects the culture whose conventions are used for string comparison. The values of these two properties are important only when you compare two <xref:System.Globalization.SortVersion> objects by using the <xref:System.Globalization.SortVersion.Equals%2A> method, the <xref:System.Globalization.SortVersion.op_Equality%2A> operator, or the <xref:System.Globalization.SortVersion.op_Inequality%2A> operator.
6675

0 commit comments

Comments
 (0)