Skip to content

Commit 90bd670

Browse files
committed
to_string(): replace template with 6 overloads, simplify
As mentioned in the code comment in `kaitai/kaitaistream.cpp`, this approach follows what `std::to_string` is doing: https://en.cppreference.com/w/cpp/string/basic_string/to_string Avoiding templates and separating signed and unsigned cases has been already suggested by @webbnh in #50. The motivation for this change is that this new version of `to_string` also accepts values of unscoped enums that can be implicitly converted to integers, just like `std::to_string` does. For example, you can do `kaitai::kstream::to_string(FRUIT_APPLE)` just like you can do `std::to_string(FRUIT_APPLE)`, where `FRUIT_APPLE` is an unscoped enum member: ```cpp enum fruit_t { FRUIT_APPLE = 2, FRUIT_ORANGE = 5 }; ``` This simplifies the translation of expressions like `fruit::apple.to_i.to_s`, because it means that the `enum.to_i` operation can remain a no-op in the KSC: [`CppTranslator.scala:193-194`](https://github.com/kaitai-io/kaitai_struct_compiler/blob/12dbc32226eb9e94b76f803f3c8ff5f8943e5f8d/shared/src/main/scala/io/kaitai/struct/translators/CppTranslator.scala#L193-L194) ```scala override def enumToInt(v: expr, et: EnumType): String = translate(v) ``` Therefore, this change fixes the [EnumToI](https://github.com/kaitai-io/kaitai_struct_tests/blob/af0359aeb87233640fcd85b57850c712b6f03f63/formats/enum_to_i.ksy) and EnumToIInvalid tests for C++11, which currently fail with this error (https://github.com/kaitai-io/ci_artifacts/blob/786a45a62978409fd90fab124b5b1ad1f225a81e/test_out/cpp_stl_11/build-1.log#L1495-L1498): ``` /tests/compiled/cpp_stl_11/enum_to_i.cpp: In member function 'std::string enum_to_i_t::pet_1_i_to_s()': /tests/compiled/cpp_stl_11/enum_to_i.cpp:65:48: error: no matching function for call to 'kaitai::kstream::to_string(enum_to_i_t::animal_t)' 65 | m_pet_1_i_to_s = kaitai::kstream::to_string(pet_1()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ ``` Eventually, we might switch to scoped enums (see kaitai-io/kaitai_struct#959), which are not implicitly convertible to integers, so then it won't be possible to keep the `.to_i` operation implemented as a no-op, but for now (as long as we're using unscoped enums) it works.
1 parent dbe1ca1 commit 90bd670

File tree

3 files changed

+199
-100
lines changed

3 files changed

+199
-100
lines changed

kaitai/kaitaistream.cpp

Lines changed: 87 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -791,20 +791,96 @@ int kaitai::kstream::mod(int a, int b) {
791791
return r;
792792
}
793793

794-
void kaitai::kstream::unsigned_to_decimal(uint64_t number, char *buffer) {
795-
// Implementation from https://ideone.com/nrQfA8 by Alf P. Steinbach
794+
void kaitai::kstream::unsigned_to_decimal(uint64_t number, char *buf, std::size_t &buf_contents_start) {
795+
// Implementation inspired by https://ideone.com/nrQfA8 by Alf P. Steinbach
796796
// (see https://vitaut.net/posts/2013/integer-to-string-conversion-in-cplusplus/)
797-
if (number == 0) {
798-
*buffer++ = '0';
797+
do {
798+
buf[--buf_contents_start] = static_cast<char>('0' + (number % 10));
799+
number /= 10;
800+
} while (number != 0);
801+
}
802+
803+
std::string kaitai::kstream::to_string_signed(int64_t val) {
804+
// `digits10 + 2` because of minus sign + leading digit (NB: null terminator is not used)
805+
char buf[std::numeric_limits<int64_t>::digits10 + 2];
806+
std::size_t buf_contents_start = sizeof(buf);
807+
if (val < 0) {
808+
// NB: `val` is negative and we need to get its absolute value (i.e. minus `val`). However, since
809+
// `int64_t` uses two's complement representation, its range is `[-2**63, 2**63 - 1] =
810+
// [-0x8000_0000_0000_0000, 0x7fff_ffff_ffff_ffff]` (both ends inclusive) and thus the naive
811+
// `-val` operation will overflow for `val = std::numeric_limits<int64_t>::min() =
812+
// -0x8000_0000_0000_0000` (because the result of `-val` is mathematically
813+
// `-(-0x8000_0000_0000_0000) = 0x8000_0000_0000_0000`, but the `int64_t` type can represent at
814+
// most `0x7fff_ffff_ffff_ffff`). And signed integer overflow is undefined behavior in C++.
815+
//
816+
// To avoid undefined behavior for `val = -0x8000_0000_0000_0000 = -2**63`, we do the following
817+
// steps for all negative `val`s:
818+
//
819+
// 1. Convert the signed (and negative) `val` to an unsigned `uint64_t` type. This is a
820+
// well-defined operation in C++: the resulting `uint64_t` value will be `val mod 2**64` (`mod`
821+
// is modulo). The maximum `val` we can have here is `-1` (because `val < 0`), a theoretical
822+
// minimum we are able to support would be `-2**64 + 1 = -0xffff_ffff_ffff_ffff` (even though
823+
// in practice the widest standard type is `int64_t` with the minimum of `-2**63`):
824+
//
825+
// * `static_cast<uint64_t>(-1) = -1 mod 2**64 = 2**64 + (-1) = 0xffff_ffff_ffff_ffff = 2**64 - 1`
826+
// * `static_cast<uint64_t>(-2**64 + 1) = (-2**64 + 1) mod 2**64 = 2**64 + (-2**64 + 1) = 1`
827+
//
828+
// 2. Subtract `static_cast<uint64_t>(val)` from `2**64 - 1 = 0xffff_ffff_ffff_ffff`. Since
829+
// `static_cast<uint64_t>(val)` is in range `[1, 2**64 - 1]` (see step 1), the result of this
830+
// subtraction will be mathematically in range `[0, (2**64 - 1) - 1] = [0, 2**64 - 2]`. So the
831+
// mathematical result cannot be negative, hence this unsigned integer subtraction can never
832+
// wrap around (which wouldn't be a good thing to rely upon because it confuses programmers and
833+
// code analysis tools).
834+
//
835+
// 3. Since we did mathematically `(2**64 - 1) - (2**64 + val) = -val - 1` so far (and we wanted
836+
// to do `-val`), we add `1` to correct that. From step 2 we know that the result of `-val - 1`
837+
// is in range `[0, 2**64 - 2]`, so adding `1` will not wrap (at most we could get `2**64 - 1 =
838+
// 0xffff_ffff_ffff_ffff`, which is still in the valid range of `uint64_t`).
839+
unsigned_to_decimal((std::numeric_limits<uint64_t>::max() - static_cast<uint64_t>(val)) + 1, buf, buf_contents_start);
840+
841+
buf[--buf_contents_start] = '-';
799842
} else {
800-
char *p_first = buffer;
801-
while (number != 0) {
802-
*buffer++ = static_cast<char>('0' + number % 10);
803-
number /= 10;
804-
}
805-
std::reverse(p_first, buffer);
843+
unsigned_to_decimal(static_cast<uint64_t>(val), buf, buf_contents_start);
806844
}
807-
*buffer = '\0';
845+
return std::string(&buf[buf_contents_start], sizeof(buf) - buf_contents_start);
846+
}
847+
848+
std::string kaitai::kstream::to_string_unsigned(uint64_t val) {
849+
// `digits10 + 1` because of leading digit (NB: null terminator is not used)
850+
char buf[std::numeric_limits<uint64_t>::digits10 + 1];
851+
std::size_t buf_contents_start = sizeof(buf);
852+
unsigned_to_decimal(val, buf, buf_contents_start);
853+
return std::string(&buf[buf_contents_start], sizeof(buf) - buf_contents_start);
854+
}
855+
856+
// NB: the following 6 overloads are exactly the ones that
857+
// [`std::to_string`](https://en.cppreference.com/w/cpp/string/basic_string/to_string) has.
858+
// Testing has shown that they are all necessary: if you remove any of them, you will get
859+
// something like `error: call to 'to_string' is ambiguous` when trying to call `to_string`
860+
// with the integer type for which you removed the overload.
861+
862+
std::string kaitai::kstream::to_string(int val) {
863+
return to_string_signed(val);
864+
}
865+
866+
std::string kaitai::kstream::to_string(long val) {
867+
return to_string_signed(val);
868+
}
869+
870+
std::string kaitai::kstream::to_string(long long val) {
871+
return to_string_signed(val);
872+
}
873+
874+
std::string kaitai::kstream::to_string(unsigned val) {
875+
return to_string_unsigned(val);
876+
}
877+
878+
std::string kaitai::kstream::to_string(unsigned long val) {
879+
return to_string_unsigned(val);
880+
}
881+
882+
std::string kaitai::kstream::to_string(unsigned long long val) {
883+
return to_string_unsigned(val);
808884
}
809885

810886
int64_t kaitai::kstream::string_to_int(const std::string& str, int base) {

kaitai/kaitaistream.h

Lines changed: 41 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,10 @@
1212
#include <stdint.h> // int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t
1313

1414
#include <ios> // std::streamsize, forward declaration of std::istream // IWYU pragma: keep
15-
#include <limits> // std::numeric_limits
15+
#include <cstddef> // std::size_t
1616
#include <sstream> // std::istringstream // IWYU pragma: keep
1717
#include <string> // std::string
1818

19-
#ifdef KAITAI_STREAM_H_CPP11_SUPPORT
20-
#include <type_traits> // std::enable_if, std::is_integral
21-
#endif
22-
2319
namespace kaitai {
2420

2521
/**
@@ -234,67 +230,45 @@ class kstream {
234230

235231
/**
236232
* Converts given integer `val` to a decimal string representation.
237-
* Should be used in place of std::to_string() (which is available only
233+
* Should be used in place of `std::to_string(int)` (which is available only
238234
* since C++11) in older C++ implementations.
239235
*/
240-
template<typename I>
241-
#ifdef KAITAI_STREAM_H_CPP11_SUPPORT
242-
// https://stackoverflow.com/a/27913885
243-
typename std::enable_if<
244-
std::is_integral<I>::value &&
245-
// check if we don't have something too large like GCC's `__int128_t`
246-
std::numeric_limits<I>::max() >= 0 &&
247-
std::numeric_limits<I>::max() <= std::numeric_limits<uint64_t>::max(),
248-
std::string
249-
>::type
250-
#else
251-
std::string
252-
#endif
253-
static to_string(I val) {
254-
// in theory, `digits10 + 3` would be enough (minus sign + leading digit
255-
// + null terminator), but let's add a little more to be safe
256-
char buf[std::numeric_limits<I>::digits10 + 5];
257-
if (val < 0) {
258-
buf[0] = '-';
259-
260-
// NB: `val` is negative and we need to get its absolute value (i.e. minus `val`). However, since
261-
// `int64_t` uses two's complement representation, its range is `[-2**63, 2**63 - 1] =
262-
// [-0x8000_0000_0000_0000, 0x7fff_ffff_ffff_ffff]` (both ends inclusive) and thus the naive
263-
// `-val` operation will overflow for `val = std::numeric_limits<int64_t>::min() =
264-
// -0x8000_0000_0000_0000` (because the result of `-val` is mathematically
265-
// `-(-0x8000_0000_0000_0000) = 0x8000_0000_0000_0000`, but the `int64_t` type can represent at
266-
// most `0x7fff_ffff_ffff_ffff`). And signed integer overflow is undefined behavior in C++.
267-
//
268-
// To avoid undefined behavior for `val = -0x8000_0000_0000_0000 = -2**63`, we do the following
269-
// steps for all negative `val`s:
270-
//
271-
// 1. Convert the signed (and negative) `val` to an unsigned `uint64_t` type. This is a
272-
// well-defined operation in C++: the resulting `uint64_t` value will be `val mod 2**64` (`mod`
273-
// is modulo). The maximum `val` we can have here is `-1` (because `val < 0`), a theoretical
274-
// minimum we are able to support would be `-2**64 + 1 = -0xffff_ffff_ffff_ffff` (even though
275-
// in practice the widest standard type is `int64_t` with the minimum of `-2**63`):
276-
//
277-
// * `static_cast<uint64_t>(-1) = -1 mod 2**64 = 2**64 + (-1) = 0xffff_ffff_ffff_ffff = 2**64 - 1`
278-
// * `static_cast<uint64_t>(-2**64 + 1) = (-2**64 + 1) mod 2**64 = 2**64 + (-2**64 + 1) = 1`
279-
//
280-
// 2. Subtract `static_cast<uint64_t>(val)` from `2**64 - 1 = 0xffff_ffff_ffff_ffff`. Since
281-
// `static_cast<uint64_t>(val)` is in range `[1, 2**64 - 1]` (see step 1), the result of this
282-
// subtraction will be mathematically in range `[0, (2**64 - 1) - 1] = [0, 2**64 - 2]`. So the
283-
// mathematical result cannot be negative, hence this unsigned integer subtraction can never
284-
// wrap around (which wouldn't be a good thing to rely upon because it confuses programmers and
285-
// code analysis tools).
286-
//
287-
// 3. Since we did mathematically `(2**64 - 1) - (2**64 + val) = -val - 1` so far (and we wanted
288-
// to do `-val`), we add `1` to correct that. From step 2 we know that the result of `-val - 1`
289-
// is in range `[0, 2**64 - 2]`, so adding `1` will not wrap (at most we could get `2**64 - 1 =
290-
// 0xffff_ffff_ffff_ffff`, which is still in the valid range of `uint64_t`).
291-
292-
unsigned_to_decimal((std::numeric_limits<uint64_t>::max() - static_cast<uint64_t>(val)) + 1, &buf[1]);
293-
} else {
294-
unsigned_to_decimal(val, buf);
295-
}
296-
return std::string(buf);
297-
}
236+
static std::string to_string(int val);
237+
238+
/**
239+
* Converts given integer `val` to a decimal string representation.
240+
* Should be used in place of `std::to_string(long)` (which is available only
241+
* since C++11) in older C++ implementations.
242+
*/
243+
static std::string to_string(long val);
244+
245+
/**
246+
* Converts given integer `val` to a decimal string representation.
247+
* Should be used in place of `std::to_string(long long)` (which is available only
248+
* since C++11) in older C++ implementations.
249+
*/
250+
static std::string to_string(long long val);
251+
252+
/**
253+
* Converts given integer `val` to a decimal string representation.
254+
* Should be used in place of `std::to_string(unsigned)` (which is available only
255+
* since C++11) in older C++ implementations.
256+
*/
257+
static std::string to_string(unsigned val);
258+
259+
/**
260+
* Converts given integer `val` to a decimal string representation.
261+
* Should be used in place of `std::to_string(unsigned long)` (which is available only
262+
* since C++11) in older C++ implementations.
263+
*/
264+
static std::string to_string(unsigned long val);
265+
266+
/**
267+
* Converts given integer `val` to a decimal string representation.
268+
* Should be used in place of `std::to_string(unsigned long long)` (which is available only
269+
* since C++11) in older C++ implementations.
270+
*/
271+
static std::string to_string(unsigned long long val);
298272

299273
/**
300274
* Converts string `str` to an integer value. Throws an exception if the
@@ -347,7 +321,9 @@ class kstream {
347321
void init();
348322
void exceptions_enable() const;
349323

350-
static void unsigned_to_decimal(uint64_t number, char *buffer);
324+
static void unsigned_to_decimal(uint64_t number, char *buf, std::size_t &buf_contents_start);
325+
static std::string to_string_signed(int64_t val);
326+
static std::string to_string_unsigned(uint64_t val);
351327

352328
#ifdef KS_STR_ENCODING_WIN32API
353329
enum {

tests/unittest.cpp

Lines changed: 71 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -66,52 +66,99 @@ TEST(KaitaiStreamTest, to_string)
6666
EXPECT_EQ(kaitai::kstream::to_string(-123), "-123");
6767
}
6868

69-
TEST(KaitaiStreamTest, to_string_uint8)
69+
// Since `kstream::to_string` must have several overloads (just like
70+
// [`std::to_string`](https://en.cppreference.com/w/cpp/string/basic_string/to_string)) to
71+
// cover all [standard integer
72+
// types](https://en.cppreference.com/w/cpp/language/types#Properties) while avoiding
73+
// templates, it's a good idea to test whether it actually works with each standard
74+
// integer type. If even just one of the 6 required overloads is missing or not working,
75+
// these tests should be able to detect it.
76+
//
77+
// We test the standard integer types (keywords), not [fixed width integer
78+
// types](https://en.cppreference.com/w/cpp/header/cstdint) (like `int32_t`), because then
79+
// we could potentially have a blind spot: `int32_t` tends to be almost universally
80+
// equivalent to `int`, but `int64_t` is either `long` (typically on 64-bit Linux) or
81+
// `long long` (typically on 64-bit Windows) but not both. So I believe that using
82+
// standard integer types gives us better coverage.
83+
84+
TEST(KaitaiStreamTest, to_string_unsigned_char)
7085
{
71-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint8_t>::min()), "0");
72-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint8_t>::max()), "255");
86+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned char>::min()), "0");
87+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned char>::max()), "255");
7388
}
7489

75-
TEST(KaitaiStreamTest, to_string_int8)
90+
TEST(KaitaiStreamTest, to_string_signed_char)
7691
{
77-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int8_t>::min()), "-128");
78-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int8_t>::max()), "127");
92+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<signed char>::min()), "-128");
93+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<signed char>::max()), "127");
7994
}
8095

81-
TEST(KaitaiStreamTest, to_string_uint16)
96+
TEST(KaitaiStreamTest, to_string_unsigned_short)
8297
{
83-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint16_t>::min()), "0");
84-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint16_t>::max()), "65535");
98+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned short>::min()), "0");
99+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned short>::max()), "65535");
85100
}
86101

87-
TEST(KaitaiStreamTest, to_string_int16)
102+
TEST(KaitaiStreamTest, to_string_short)
88103
{
89-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int16_t>::min()), "-32768");
90-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int16_t>::max()), "32767");
104+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<short>::min()), "-32768");
105+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<short>::max()), "32767");
91106
}
92107

93-
TEST(KaitaiStreamTest, to_string_uint32)
108+
TEST(KaitaiStreamTest, to_string_unsigned)
94109
{
95-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint32_t>::min()), "0");
96-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint32_t>::max()), "4294967295");
110+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned>::min()), "0");
111+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned>::max()), "4294967295");
97112
}
98113

99-
TEST(KaitaiStreamTest, to_string_int32)
114+
TEST(KaitaiStreamTest, to_string_int)
100115
{
101-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int32_t>::min()), "-2147483648");
102-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int32_t>::max()), "2147483647");
116+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int>::min()), "-2147483648");
117+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int>::max()), "2147483647");
118+
}
119+
120+
#ifdef _MSC_VER
121+
#pragma warning(push)
122+
// Disable `warning C4127: conditional expression is constant`
123+
// (see https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4127?view=msvc-170)
124+
#pragma warning(disable: 4127)
125+
#endif
126+
127+
TEST(KaitaiStreamTest, to_string_unsigned_long)
128+
{
129+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned long>::min()), "0");
130+
if (sizeof(unsigned long) == 4) {
131+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned long>::max()), "4294967295");
132+
} else {
133+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned long>::max()), "18446744073709551615");
134+
}
103135
}
104136

105-
TEST(KaitaiStreamTest, to_string_uint64)
137+
TEST(KaitaiStreamTest, to_string_long)
138+
{
139+
if (sizeof(long) == 4) {
140+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<long>::min()), "-2147483648");
141+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<long>::max()), "2147483647");
142+
} else {
143+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<long>::min()), "-9223372036854775808");
144+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<long>::max()), "9223372036854775807");
145+
}
146+
}
147+
148+
#ifdef _MSC_VER
149+
#pragma warning(pop)
150+
#endif
151+
152+
TEST(KaitaiStreamTest, to_string_unsigned_long_long)
106153
{
107-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint64_t>::min()), "0");
108-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<uint64_t>::max()), "18446744073709551615");
154+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned long long>::min()), "0");
155+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<unsigned long long>::max()), "18446744073709551615");
109156
}
110157

111-
TEST(KaitaiStreamTest, to_string_int64)
158+
TEST(KaitaiStreamTest, to_string_long_long)
112159
{
113-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int64_t>::min()), "-9223372036854775808");
114-
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<int64_t>::max()), "9223372036854775807");
160+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<long long>::min()), "-9223372036854775808");
161+
EXPECT_EQ(kaitai::kstream::to_string(std::numeric_limits<long long>::max()), "9223372036854775807");
115162
}
116163

117164
TEST(KaitaiStreamTest, string_to_int)

0 commit comments

Comments
 (0)