Commit 6075f12
committed
MDEV-31071 Refactor case folding data types in Unicode collations
This is a non-functional change. It changes the way how case folding data and weight data (for simple Unicode collations) are stored: - Removing data types MY_UNICASE_CHARACTER, MY_UNICASE_INFO - Using data types MY_CASEFOLD_CHARACTER, MY_CASEFOLD_INFO instead. This patch changes simple Unicode collations in a similar way how MDEV-30695 previously changed Asian collations. No new MTR tests are needed. The underlying code is thoroughly covered by a number of ctype_*_ws.test and ctype_*_casefold.test files, which were added recently as a preparation for this change. Old and new Unicode data layout ------------------------------- Case folding data is now stored in separate tables consisting of MY_CASEFOLD_CHARACTER elements with two members: typedef struct casefold_info_char_t { uint32 toupper; uint32 tolower; } MY_CASEFOLD_CHARACTER; while weight data (for simple non-UCA collations xxx_general_ci and xxx_general_mysql500_ci) is stored in separate arrays of uint16 elements. Before this change case folding data and simple weight data were stored together, in tables of the following elements with three members: typedef struct unicase_info_char_st { uint32 toupper; uint32 tolower; uint32 sort; /* weights for simple collations */ } MY_UNICASE_CHARACTER; This data format was redundant, because weights (the "sort" member) were needed only for these two simple Unicode collations: - xxx_general_ci - xxx_general_mysql500_ci Adding case folding information for Unicode-14.0.0 using the old format would waste memory without purpose. Detailed changes ---------------- - Changing the underlying data types as described above - Including unidata-dump.c into the sources. This program was earlier used to dump UnicodeData.txt (e.g. https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt) into MySQL / MariaDB source files. It was originally written in 2002, but has not been distributed yet together with MySQL / MariaDB sources. - Removing the old format Unicode data earlier dumped from UnicodeData.txt (versions 3.0.0 and 5.2.0) from ctype-utf8.c. Adding Unicode data in the new format into separate header files, to maintain the code easier: - ctype-unicode300-casefold.h - ctype-unicode300-casefold-tr.h - ctype-unicode300-general_ci.h - ctype-unicode300-general_mysql500_ci.h - ctype-unicode520-casefold.h - Adding a new file ctype-unidata.c as an aggregator for the header files listed above.1 parent 2ad287c commit 6075f12
File tree
29 files changed
+7471
-5195
lines changed- include
- strings
29 files changed
+7471
-5195
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
83 | 82 | | |
84 | 83 | | |
85 | 84 | | |
| |||
97 | 96 | | |
98 | 97 | | |
99 | 98 | | |
| 99 | + | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | 103 | | |
124 | 104 | | |
125 | 105 | | |
| |||
795 | 775 | | |
796 | 776 | | |
797 | 777 | | |
798 | | - | |
799 | 778 | | |
800 | 779 | | |
801 | 780 | | |
| |||
1691 | 1670 | | |
1692 | 1671 | | |
1693 | 1672 | | |
1694 | | - | |
| 1673 | + | |
1695 | 1674 | | |
1696 | 1675 | | |
1697 | 1676 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
409 | 409 | | |
410 | 410 | | |
411 | 411 | | |
412 | | - | |
413 | 412 | | |
414 | 413 | | |
415 | 414 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
806 | 806 | | |
807 | 807 | | |
808 | 808 | | |
809 | | - | |
| 809 | + | |
| 810 | + | |
810 | 811 | | |
811 | 812 | | |
812 | 813 | | |
| |||
6847 | 6848 | | |
6848 | 6849 | | |
6849 | 6850 | | |
6850 | | - | |
6851 | 6851 | | |
6852 | 6852 | | |
6853 | 6853 | | |
| |||
6879 | 6879 | | |
6880 | 6880 | | |
6881 | 6881 | | |
6882 | | - | |
6883 | 6882 | | |
6884 | 6883 | | |
6885 | 6884 | | |
| |||
6911 | 6910 | | |
6912 | 6911 | | |
6913 | 6912 | | |
6914 | | - | |
6915 | 6913 | | |
6916 | 6914 | | |
6917 | 6915 | | |
| |||
6943 | 6941 | | |
6944 | 6942 | | |
6945 | 6943 | | |
6946 | | - | |
6947 | 6944 | | |
6948 | 6945 | | |
6949 | 6946 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
625 | 625 | | |
626 | 626 | | |
627 | 627 | | |
628 | | - | |
629 | 628 | | |
630 | 629 | | |
631 | 630 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1706 | 1706 | | |
1707 | 1707 | | |
1708 | 1708 | | |
1709 | | - | |
| 1709 | + | |
| 1710 | + | |
1710 | 1711 | | |
1711 | 1712 | | |
1712 | 1713 | | |
| |||
34805 | 34806 | | |
34806 | 34807 | | |
34807 | 34808 | | |
34808 | | - | |
34809 | 34809 | | |
34810 | 34810 | | |
34811 | 34811 | | |
| |||
34836 | 34836 | | |
34837 | 34837 | | |
34838 | 34838 | | |
34839 | | - | |
34840 | 34839 | | |
34841 | 34840 | | |
34842 | 34841 | | |
| |||
34868 | 34867 | | |
34869 | 34868 | | |
34870 | 34869 | | |
34871 | | - | |
34872 | 34870 | | |
34873 | 34871 | | |
34874 | 34872 | | |
| |||
34899 | 34897 | | |
34900 | 34898 | | |
34901 | 34899 | | |
34902 | | - | |
34903 | 34900 | | |
34904 | 34901 | | |
34905 | 34902 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
621 | | - | |
622 | 621 | | |
623 | 622 | | |
624 | 623 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1483 | 1483 | | |
1484 | 1484 | | |
1485 | 1485 | | |
1486 | | - | |
| 1486 | + | |
| 1487 | + | |
1487 | 1488 | | |
1488 | 1489 | | |
1489 | 1490 | | |
| |||
10095 | 10096 | | |
10096 | 10097 | | |
10097 | 10098 | | |
10098 | | - | |
10099 | 10099 | | |
10100 | 10100 | | |
10101 | 10101 | | |
| |||
10127 | 10127 | | |
10128 | 10128 | | |
10129 | 10129 | | |
10130 | | - | |
10131 | 10130 | | |
10132 | 10131 | | |
10133 | 10132 | | |
| |||
10159 | 10158 | | |
10160 | 10159 | | |
10161 | 10160 | | |
10162 | | - | |
10163 | 10161 | | |
10164 | 10162 | | |
10165 | 10163 | | |
| |||
10191 | 10189 | | |
10192 | 10190 | | |
10193 | 10191 | | |
10194 | | - | |
10195 | 10192 | | |
10196 | 10193 | | |
10197 | 10194 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1779 | 1779 | | |
1780 | 1780 | | |
1781 | 1781 | | |
1782 | | - | |
| 1782 | + | |
| 1783 | + | |
1783 | 1784 | | |
1784 | 1785 | | |
1785 | 1786 | | |
| |||
67634 | 67635 | | |
67635 | 67636 | | |
67636 | 67637 | | |
67637 | | - | |
67638 | 67638 | | |
67639 | 67639 | | |
67640 | 67640 | | |
| |||
67666 | 67666 | | |
67667 | 67667 | | |
67668 | 67668 | | |
67669 | | - | |
67670 | 67669 | | |
67671 | 67670 | | |
67672 | 67671 | | |
| |||
67698 | 67697 | | |
67699 | 67698 | | |
67700 | 67699 | | |
67701 | | - | |
67702 | 67700 | | |
67703 | 67701 | | |
67704 | 67702 | | |
| |||
67730 | 67728 | | |
67731 | 67729 | | |
67732 | 67730 | | |
67733 | | - | |
67734 | 67731 | | |
67735 | 67732 | | |
67736 | 67733 | | |
| |||
0 commit comments