-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[libc++] Mostly Implement P1885R12: <text_encoding>
#141312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[libc++] Mostly Implement P1885R12: <text_encoding>
#141312
Conversation
|
@llvm/pr-subscribers-libcxx Author: William Tran-Viet (smallp-o-p) ChangesResolve #105373 and consequently #118371 First crack at Patch is 118.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141312.diff 32 Files Affected:
diff --git a/libcxx/docs/FeatureTestMacroTable.rst b/libcxx/docs/FeatureTestMacroTable.rst
index 9b57b7c8eeb52..93308e4078075 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -500,7 +500,7 @@ Status
---------------------------------------------------------- -----------------
``__cpp_lib_submdspan`` *unimplemented*
---------------------------------------------------------- -----------------
- ``__cpp_lib_text_encoding`` *unimplemented*
+ ``__cpp_lib_text_encoding`` ``202306L``
---------------------------------------------------------- -----------------
``__cpp_lib_to_chars`` *unimplemented*
---------------------------------------------------------- -----------------
diff --git a/libcxx/docs/Status/Cxx2cPapers.csv b/libcxx/docs/Status/Cxx2cPapers.csv
index 3809446a57896..a7dfa75df7c87 100644
--- a/libcxx/docs/Status/Cxx2cPapers.csv
+++ b/libcxx/docs/Status/Cxx2cPapers.csv
@@ -13,7 +13,7 @@
"`P2013R5 <https://wg21.link/P2013R5>`__","Freestanding Language: Optional ``::operator new``","2023-06 (Varna)","","",""
"`P2363R5 <https://wg21.link/P2363R5>`__","Extending associative containers with the remaining heterogeneous overloads","2023-06 (Varna)","","",""
"`P1901R2 <https://wg21.link/P1901R2>`__","Enabling the Use of ``weak_ptr`` as Keys in Unordered Associative Containers","2023-06 (Varna)","","",""
-"`P1885R12 <https://wg21.link/P1885R12>`__","Naming Text Encodings to Demystify Them","2023-06 (Varna)","","",""
+"`P1885R12 <https://wg21.link/P1885R12>`__","Naming Text Encodings to Demystify Them","2023-06 (Varna)","|Complete|","21",""
"`P0792R14 <https://wg21.link/P0792R14>`__","``function_ref``: a type-erased callable reference","2023-06 (Varna)","","",""
"`P2874R2 <https://wg21.link/P2874R2>`__","P2874R2: Mandating Annex D Require No More","2023-06 (Varna)","|Complete|","12",""
"`P2757R3 <https://wg21.link/P2757R3>`__","Type-checking format args","2023-06 (Varna)","","",""
@@ -79,7 +79,7 @@
"`P3136R1 <https://wg21.link/P3136R1>`__","Retiring niebloids","2024-11 (Wrocław)","|Complete|","14",""
"`P3138R5 <https://wg21.link/P3138R5>`__","``views::cache_latest``","2024-11 (Wrocław)","","",""
"`P3379R0 <https://wg21.link/P3379R0>`__","Constrain ``std::expected`` equality operators","2024-11 (Wrocław)","|Complete|","21",""
-"`P2862R1 <https://wg21.link/P2862R1>`__","``text_encoding::name()`` should never return null values","2024-11 (Wrocław)","","",""
+"`P2862R1 <https://wg21.link/P2862R1>`__","``text_encoding::name()`` should never return null values","2024-11 (Wrocław)","|Complete|","21",""
"`P2897R7 <https://wg21.link/P2897R7>`__","``aligned_accessor``: An ``mdspan`` accessor expressing pointer over-alignment","2024-11 (Wrocław)","|Complete|","21",""
"`P3355R1 <https://wg21.link/P3355R1>`__","Fix ``submdspan`` for C++26","2024-11 (Wrocław)","","",""
"`P3222R0 <https://wg21.link/P3222R0>`__","Fix C++26 by adding transposed special cases for P2642 layouts","2024-11 (Wrocław)","","",""
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 43cefd5600646..ba61ee7c11e35 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -751,6 +751,7 @@ set(files
__system_error/error_condition.h
__system_error/system_error.h
__system_error/throw_system_error.h
+ __text_encoding/text_encoding.h
__thread/formatter.h
__thread/id.h
__thread/jthread.h
@@ -1062,6 +1063,7 @@ set(files
strstream
syncstream
system_error
+ text_encoding
tgmath.h
thread
tuple
diff --git a/libcxx/include/__locale b/libcxx/include/__locale
index d6c6ef19627ff..4da3f38ac408f 100644
--- a/libcxx/include/__locale
+++ b/libcxx/include/__locale
@@ -31,6 +31,10 @@
# include <cstddef>
# include <cstring>
+# if _LIBCPP_STD_VER >= 26
+# include <__text_encoding/text_encoding.h>
+# endif
+
# if _LIBCPP_HAS_WIDE_CHARACTERS
# include <cwchar>
# else
@@ -99,6 +103,11 @@ public:
// locale operations:
string name() const;
+
+# if _LIBCPP_STD_VER >= 26 && __CHAR_BIT__ == 8
+ text_encoding encoding() const;
+# endif // _LIBCPP_STD_VER >= 26
+
bool operator==(const locale&) const;
# if _LIBCPP_STD_VER <= 17
_LIBCPP_HIDE_FROM_ABI bool operator!=(const locale& __y) const { return !(*this == __y); }
diff --git a/libcxx/include/__text_encoding/text_encoding.h b/libcxx/include/__text_encoding/text_encoding.h
new file mode 100644
index 0000000000000..93d0ae2ab6b89
--- /dev/null
+++ b/libcxx/include/__text_encoding/text_encoding.h
@@ -0,0 +1,1483 @@
+// -*- C++ -*-
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___TEXT_ENCODING_TEXT_ENCODING_H
+#define _LIBCPP___TEXT_ENCODING_TEXT_ENCODING_H
+
+#include <__config>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+# pragma GCC system_header
+#endif
+
+#if _LIBCPP_HAS_LOCALIZATION
+
+#include <__algorithm/copy_n.h>
+#include <__algorithm/lower_bound.h>
+#include <__algorithm/min.h>
+#include <__functional/hash.h>
+#include <__iterator/iterator_traits.h>
+#include <__locale_dir/locale_base_api.h>
+#include <__ranges/view_interface.h>
+#include <__string/char_traits.h>
+#include <__utility/unreachable.h>
+#include <cstdint>
+#include <string_view>
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+#if _LIBCPP_STD_VER >= 26
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+struct _LIBCPP_EXPORTED_FROM_ABI text_encoding {
+ static constexpr size_t max_name_length = 63;
+
+private:
+ struct __encoding_data {
+ using __id_rep _LIBCPP_NODEBUG = int_least32_t;
+ __id_rep __mib_rep;
+ const char* __name;
+
+ friend constexpr bool operator==(const __encoding_data& __e, const __encoding_data& __other) _NOEXCEPT {
+ return __e.__mib_rep == __other.__mib_rep || __comp_name(__e.__name, __other.__name);
+ }
+
+ friend constexpr bool operator<(const __encoding_data& __e, const __id_rep __i) _NOEXCEPT {
+ return __e.__mib_rep < __i;
+ }
+ };
+
+public:
+ enum class id : __encoding_data::__id_rep {
+ other = 1,
+ unknown = 2,
+ ASCII = 3,
+ ISOLatin1 = 4,
+ ISOLatin2 = 5,
+ ISOLatin3 = 6,
+ ISOLatin4 = 7,
+ ISOLatinCyrillic = 8,
+ ISOLatinArabic = 9,
+ ISOLatinGreek = 10,
+ ISOLatinHebrew = 11,
+ ISOLatin5 = 12,
+ ISOLatin6 = 13,
+ ISOTextComm = 14,
+ HalfWidthKatakana = 15,
+ JISEncoding = 16,
+ ShiftJIS = 17,
+ EUCPkdFmtJapanese = 18,
+ EUCFixWidJapanese = 19,
+ ISO4UnitedKingdom = 20,
+ ISO11SwedishForNames = 21,
+ ISO15Italian = 22,
+ ISO17Spanish = 23,
+ ISO21German = 24,
+ ISO60DanishNorwegian = 25,
+ ISO69French = 26,
+ ISO10646UTF1 = 27,
+ ISO646basic1983 = 28,
+ INVARIANT = 29,
+ ISO2IntlRefVersion = 30,
+ NATSSEFI = 31,
+ NATSSEFIADD = 32,
+ NATSDANO = 33,
+ NATSDANOADD = 34,
+ ISO10Swedish = 35,
+ KSC56011987 = 36,
+ ISO2022KR = 37,
+ EUCKR = 38,
+ ISO2022JP = 39,
+ ISO2022JP2 = 40,
+ ISO13JISC6220jp = 41,
+ ISO14JISC6220ro = 42,
+ ISO16Portuguese = 43,
+ ISO18Greek7Old = 44,
+ ISO19LatinGreek = 45,
+ ISO25French = 46,
+ ISO27LatinGreek1 = 47,
+ ISO5427Cyrillic = 48,
+ ISO42JISC62261978 = 49,
+ ISO47BSViewdata = 50,
+ ISO49INIS = 51,
+ ISO50INIS8 = 52,
+ ISO51INISCyrillic = 53,
+ ISO54271981 = 54,
+ ISO5428Greek = 55,
+ ISO57GB1988 = 56,
+ ISO58GB231280 = 57,
+ ISO61Norwegian2 = 58,
+ ISO70VideotexSupp1 = 59,
+ ISO84Portuguese2 = 60,
+ ISO85Spanish2 = 61,
+ ISO86Hungarian = 62,
+ ISO87JISX0208 = 63,
+ ISO88Greek7 = 64,
+ ISO89ASMO449 = 65,
+ ISO90 = 66,
+ ISO91JISC62291984a = 67,
+ ISO92JISC62991984b = 68,
+ ISO93JIS62291984badd = 69,
+ ISO94JIS62291984hand = 70,
+ ISO95JIS62291984handadd = 71,
+ ISO96JISC62291984kana = 72,
+ ISO2033 = 73,
+ ISO99NAPLPS = 74,
+ ISO102T617bit = 75,
+ ISO103T618bit = 76,
+ ISO111ECMACyrillic = 77,
+ ISO121Canadian1 = 78,
+ ISO122Canadian2 = 79,
+ ISO123CSAZ24341985gr = 80,
+ ISO88596E = 81,
+ ISO88596I = 82,
+ ISO128T101G2 = 83,
+ ISO88598E = 84,
+ ISO88598I = 85,
+ ISO139CSN369103 = 86,
+ ISO141JUSIB1002 = 87,
+ ISO143IECP271 = 88,
+ ISO146Serbian = 89,
+ ISO147Macedonian = 90,
+ ISO150 = 91,
+ ISO151Cuba = 92,
+ ISO6937Add = 93,
+ ISO153GOST1976874 = 94,
+ ISO8859Supp = 95,
+ ISO10367Box = 96,
+ ISO158Lap = 97,
+ ISO159JISX02121990 = 98,
+ ISO646Danish = 99,
+ USDK = 100,
+ DKUS = 101,
+ KSC5636 = 102,
+ Unicode11UTF7 = 103,
+ ISO2022CN = 104,
+ ISO2022CNEXT = 105,
+ UTF8 = 106,
+ ISO885913 = 109,
+ ISO885914 = 110,
+ ISO885915 = 111,
+ ISO885916 = 112,
+ GBK = 113,
+ GB18030 = 114,
+ OSDEBCDICDF0415 = 115,
+ OSDEBCDICDF03IRV = 116,
+ OSDEBCDICDF041 = 117,
+ ISO115481 = 118,
+ KZ1048 = 119,
+ UCS2 = 1000,
+ UCS4 = 1001,
+ UnicodeASCII = 1002,
+ UnicodeLatin1 = 1003,
+ UnicodeJapanese = 1004,
+ UnicodeIBM1261 = 1005,
+ UnicodeIBM1268 = 1006,
+ UnicodeIBM1276 = 1007,
+ UnicodeIBM1264 = 1008,
+ UnicodeIBM1265 = 1009,
+ Unicode11 = 1010,
+ SCSU = 1011,
+ UTF7 = 1012,
+ UTF16BE = 1013,
+ UTF16LE = 1014,
+ UTF16 = 1015,
+ CESU8 = 1016,
+ UTF32 = 1017,
+ UTF32BE = 1018,
+ UTF32LE = 1019,
+ BOCU1 = 1020,
+ UTF7IMAP = 1021,
+ Windows30Latin1 = 2000,
+ Windows31Latin1 = 2001,
+ Windows31Latin2 = 2002,
+ Windows31Latin5 = 2003,
+ HPRoman8 = 2004,
+ AdobeStandardEncoding = 2005,
+ VenturaUS = 2006,
+ VenturaInternational = 2007,
+ DECMCS = 2008,
+ PC850Multilingual = 2009,
+ PC8DanishNorwegian = 2012,
+ PC862LatinHebrew = 2013,
+ PC8Turkish = 2014,
+ IBMSymbols = 2015,
+ IBMThai = 2016,
+ HPLegal = 2017,
+ HPPiFont = 2018,
+ HPMath8 = 2019,
+ HPPSMath = 2020,
+ HPDesktop = 2021,
+ VenturaMath = 2022,
+ MicrosoftPublishing = 2023,
+ Windows31J = 2024,
+ GB2312 = 2025,
+ Big5 = 2026,
+ Macintosh = 2027,
+ IBM037 = 2028,
+ IBM038 = 2029,
+ IBM273 = 2030,
+ IBM274 = 2031,
+ IBM275 = 2032,
+ IBM277 = 2033,
+ IBM278 = 2034,
+ IBM280 = 2035,
+ IBM281 = 2036,
+ IBM284 = 2037,
+ IBM285 = 2038,
+ IBM290 = 2039,
+ IBM297 = 2040,
+ IBM420 = 2041,
+ IBM423 = 2042,
+ IBM424 = 2043,
+ PC8CodePage437 = 2011,
+ IBM500 = 2044,
+ IBM851 = 2045,
+ PCp852 = 2010,
+ IBM855 = 2046,
+ IBM857 = 2047,
+ IBM860 = 2048,
+ IBM861 = 2049,
+ IBM863 = 2050,
+ IBM864 = 2051,
+ IBM865 = 2052,
+ IBM868 = 2053,
+ IBM869 = 2054,
+ IBM870 = 2055,
+ IBM871 = 2056,
+ IBM880 = 2057,
+ IBM891 = 2058,
+ IBM903 = 2059,
+ IBBM904 = 2060,
+ IBM905 = 2061,
+ IBM918 = 2062,
+ IBM1026 = 2063,
+ IBMEBCDICATDE = 2064,
+ EBCDICATDEA = 2065,
+ EBCDICCAFR = 2066,
+ EBCDICDKNO = 2067,
+ EBCDICDKNOA = 2068,
+ EBCDICFISE = 2069,
+ EBCDICFISEA = 2070,
+ EBCDICFR = 2071,
+ EBCDICIT = 2072,
+ EBCDICPT = 2073,
+ EBCDICES = 2074,
+ EBCDICESA = 2075,
+ EBCDICESS = 2076,
+ EBCDICUK = 2077,
+ EBCDICUS = 2078,
+ Unknown8BiT = 2079,
+ Mnemonic = 2080,
+ Mnem = 2081,
+ VISCII = 2082,
+ VIQR = 2083,
+ KOI8R = 2084,
+ HZGB2312 = 2085,
+ IBM866 = 2086,
+ PC775Baltic = 2087,
+ KOI8U = 2088,
+ IBM00858 = 2089,
+ IBM00924 = 2090,
+ IBM01140 = 2091,
+ IBM01141 = 2092,
+ IBM01142 = 2093,
+ IBM01143 = 2094,
+ IBM01144 = 2095,
+ IBM01145 = 2096,
+ IBM01146 = 2097,
+ IBM01147 = 2098,
+ IBM01148 = 2099,
+ IBM01149 = 2100,
+ Big5HKSCS = 2101,
+ IBM1047 = 2102,
+ PTCP154 = 2103,
+ Amiga1251 = 2104,
+ KOI7switched = 2105,
+ BRF = 2106,
+ TSCII = 2107,
+ CP51932 = 2108,
+ windows874 = 2109,
+ windows1250 = 2250,
+ windows1251 = 2251,
+ windows1252 = 2252,
+ windows1253 = 2253,
+ windows1254 = 2254,
+ windows1255 = 2255,
+ windows1256 = 2256,
+ windows1257 = 2257,
+ windows1258 = 2258,
+ TIS620 = 2259,
+ CP50220 = 2260,
+ reserved = 3000
+ };
+
+ using enum id;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr text_encoding() = default;
+ _LIBCPP_HIDE_FROM_ABI constexpr explicit text_encoding(string_view __enc) _NOEXCEPT
+ : __encoding_rep_(__find_encoding_data(__enc)) {
+ __enc.copy(__name_, max_name_length, 0);
+ }
+ _LIBCPP_HIDE_FROM_ABI constexpr text_encoding(id __i) _NOEXCEPT : __encoding_rep_(__find_encoding_data_by_id(__i)) {
+ if (__encoding_rep_->__name[0] != '\0')
+ std::copy_n(__encoding_rep_->__name, std::char_traits<char>::length(__encoding_rep_->__name), __name_);
+ }
+
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr id mib() const _NOEXCEPT { return id(__encoding_rep_->__mib_rep); }
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr const char* name() const _NOEXCEPT { return __name_; }
+
+ // [text.encoding.aliases], class text_encoding::aliases_view
+ struct aliases_view : ranges::view_interface<aliases_view> {
+ constexpr aliases_view() = default;
+ constexpr aliases_view(const __encoding_data* __d) : __view_data_(__d) {}
+ struct __end_sentinel {};
+ struct __iterator {
+ using value_type = const char*;
+ using reference = const char*;
+ using difference_type = ptrdiff_t;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator() noexcept = default;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr value_type operator*() const {
+ if (__can_dereference())
+ return __data_->__name;
+ std::unreachable();
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr value_type operator[](difference_type __n) const {
+ auto __it = *this;
+ return *(__it + __n);
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator+(__iterator __it, difference_type __n) {
+ __it += __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator+(difference_type __n, __iterator __it) {
+ __it += __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator-(__iterator __it, difference_type __n) {
+ __it -= __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr difference_type operator-(const __iterator& __other) const
+ {
+ if(__other.__mib_rep_ == __mib_rep_)
+ return __mib_rep_ - __other.__mib_rep_;
+ std::unreachable();
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator-(difference_type __n, __iterator& __it) {
+ __it -= __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator++() {
+ __data_++;
+ return *this;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator operator++(int) {
+ auto __old = *this;
+ __data_++;
+ return __old;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator--() {
+ __data_--;
+ return *this;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator operator--(int) {
+ auto __old = *this;
+ __data_--;
+ return __old;
+ }
+
+ // Check if going past the encoding data list array and if the new index has the same id, if not then
+ // replace it with a sentinel "out-of-bounds" iterator.
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator+=(difference_type __n) {
+ if (__data_) [[__likely__]] {
+ if (__n > 0) {
+ if ((__data_ + __n) < std::end(__text_encoding_data) && __data_[__n - 1].__mib_rep == __mib_rep_)
+ __data_ += __n;
+ else
+ *this = __iterator{};
+ } else if (__n < 0) {
+ if ((__data_ + __n) > __text_encoding_data && __data_[__n].__mib_rep == __mib_rep_)
+ __data_ += __n;
+ else
+ *this = __iterator{};
+ }
+ }
+ return *this;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator-=(difference_type __n) { return operator+=(-__n); }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr bool operator==(const __iterator& __it) const {
+ return __data_ == __it.__data_ && __it.__mib_rep_ == __mib_rep_;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr bool operator==(__end_sentinel) const { return !__can_dereference(); }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr auto operator<=>(__iterator __it) const { return __data_ <=> __it.__data_; }
+
+ private:
+ friend struct text_encoding;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator(const __encoding_data* __enc_d) noexcept
+ ...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,,inc,cpp -- libcxx/include/__text_encoding/te_impl.h libcxx/include/text_encoding libcxx/src/text_encoding.cpp libcxx/test/libcxx/utilities/text_encoding/environment.pass.cpp libcxx/test/libcxx/utilities/text_encoding/text_encoding.members/environment.nodiscard.verify.cpp libcxx/test/libcxx/utilities/text_encoding/text_encoding.members/nodiscard.verify.cpp libcxx/test/std/language.support/support.limits/support.limits.general/text_encoding.version.compile.pass.cpp libcxx/test/std/localization/locales/locale/locale.members/encoding.pass.cpp libcxx/test/std/utilities/text_encoding/test_text_encoding.h libcxx/test/std/utilities/text_encoding/text_encoding.ctor/default.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.ctor/id.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.ctor/string_view.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.eq/equal.id.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.eq/equal.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.hash/enabled_hash.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.hash/hash.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/aliases_view.compile.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/environment.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/literal.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/begin.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/empty.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/end.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/front.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/iterator.pass.cpp libcxx/test/std/utilities/text_encoding/trivially_copyable.compile.pass.cpp libcxx/include/__configuration/availability.h libcxx/include/__locale_dir/locale_base_api.h libcxx/include/__locale_dir/support/bsd_like.h libcxx/include/__locale_dir/support/linux.h libcxx/include/version libcxx/modules/std/text_encoding.inc libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp libcxx/test/support/platform_support.h --diff_from_common_commit
View the diff from clang-format here.diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
index 7fa1644e9..5b0b31f4b 100644
--- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
@@ -33,7 +33,7 @@
// glibc <langinfo.h> has a THOUSANDS_SEP macro already defined
#ifdef THOUSANDS_SEP
-#undef THOUSANDS_SEP
+# undef THOUSANDS_SEP
#endif
#ifdef _AIX
@@ -544,8 +544,7 @@ int main(int, char**)
std::noshowbase(ios);
}
{ // negative, showbase
- std::wstring v =
- convert_thousands_sep(L"-1" THOUSANDS_SEP "234" THOUSANDS_SEP "567,89 \u20ac"); // EURO SIGN
+ std::wstring v = convert_thousands_sep(L"-1" THOUSANDS_SEP "234" THOUSANDS_SEP "567,89 \u20ac"); // EURO SIGN
std::showbase(ios);
typedef cpp17_input_iterator<const wchar_t*> I;
long double ex;
diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
index 06adf9b08..171a3ab9f 100644
--- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
@@ -34,7 +34,7 @@
// glibc <langinfo.h> has a THOUSANDS_SEP macro already defined
#ifdef THOUSANDS_SEP
-#undef THOUSANDS_SEP
+# undef THOUSANDS_SEP
#endif
#ifdef _AIX
|
|
Thanks! I've edited the PR description to associate this PR with both issues. |
cor3ntin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting to see progress on this
I'm not a library maintainer, so take my comments for what they are worth :)
What's the Windows support for libc++ these days? environment is Posix specific at the moment
libcxx/src/text_encoding.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to check that you are in a POSIX environment here. nl_langinfo_l is not going to be available on windows, for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in the locale base API, since this is platform-specific and locale related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use strncpy here - but I don;t know what libc++ folks prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather save the size and avoid this call entirely. I'm pretty sure we can get away with not even increasing the size of the struct, since it's at most 63.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function intended that it could be optimized - eg for utf8 - such that it would not access / odr-use the data table. But that implementation is fine, especially as a first pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To decrease the size at least somewhat, we could instead have a union of these two and set the last byte to a non-zero value if we store a pointer. The __name_ would be the same as __encoding_rep_ in that case IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is feasible due to 2.2 in the case of a match found for the name passed in enc, we'd have to copy the name into the buffer and somehow be able to retrieve the id without the pointer.
Edit: We could still avoid the call to copy_n though and change name() to check if the first character is null terminator.
libcxx/src/text_encoding.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in the locale base API, since this is platform-specific and locale related.
libcxx/test/std/localization/locales/locale/locale.members/encoding.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/localization/locales/locale/locale.members/encoding.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/utilities/text_encoding/text_encoding.ctor/id.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/utilities/text_encoding/text_encoding.ctor/string_view.pass.cpp
Outdated
Show resolved
Hide resolved
|
@smallp-o-p BTW You should use proper GitHub syntax to link the PR to the issues: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue |
|
@smallp-o-p I think you might need to implement availability macros for macOS. I've done that just for free functions, so I'm not entirely sure if it applies for your case. Look for the |
|
The issue with
I'll have to dig in more with the availability macros, how |
I think there's another approach. Note that
The helper class and function can be available in old modes without exposing |
|
I have no idea what libc++ policies are in terms of backporting, but given that this feature is most useful for legacy systems, it might be reasonable to make it available in C++20 (we need consteval) - assuming there are appropriate warnings (note that gcc does not do that though). |
|
A couple notes based on the recent build failures:
It may just be better to use the draft implementation for |
I belive the latest released NDK. |
6b29be5 to
14317bf
Compare
8018cca to
4bcc67b
Compare
<text_encoding> header<text_encoding>
<text_encoding><text_encoding>
|
Marking as ready since I'm able to put some more time back into this. Kindly requesting re-reviews :) |
|
✅ With the latest revision this PR passed the Python code formatter. |
.../std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/begin.pass.cpp
Show resolved
Hide resolved
.../std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/begin.pass.cpp
Outdated
Show resolved
Hide resolved
.../std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/begin.pass.cpp
Outdated
Show resolved
Hide resolved
.../std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/begin.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/utilities/text_encoding/text_encoding.eq/equal.pass.cpp
Outdated
Show resolved
Hide resolved
.../std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/empty.pass.cpp
Outdated
Show resolved
Hide resolved
| # include <__config> | ||
| # include <__functional/hash.h> | ||
| # include <__ranges/enable_borrowed_range.h> | ||
| # include <__text_encoding/te_impl.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding te_impl.h: I think headers are granularized like that for a reason. I don't see one here. Is there any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it in its own header mostly for organization purposes, because IMO it's a little easier to go through a file dedicated for the implementation only rather than having both the impl (which needs to be exposed in versions earlier than C++26) and the wrapper around it in the same file.
I'm not very strongly opinionated on this, so if the maintainers prefer it all in one file then I'm absolutely fine with that.
<text_encoding><text_encoding>
| _LIBCPP_PUSH_MACROS | ||
| #include <__undef_macros> | ||
|
|
||
| #if _LIBCPP_STD_VER >= 23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this "23"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to minimize the amount of people that could see te_impl but I may have been overly paranoid about it.
| #ifdef _AIX | ||
| // the AIX libc expects U202F as LC_MONETARY thousands_sep | ||
| # define THOUSANDS_SEP L"\u202F" | ||
| # define THOUSANDS_SEP_ L"\u202F" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you make this change? It seems unrelated and unnecessary? Unrated changes should be done in separate patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glibc <langinfo.h> introduces a THOUSANDS_SEP which conflicted with this one. I'm fairly certain however, that it's a GNU extension...
| using id = std::text_encoding::id; | ||
|
|
||
| int main(int, char**) { | ||
| #if false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for locale::encoding(), which I don't envision being able to implement in this PR. Wasn't sure whether to remove it or leave it in for the future. I do remember seeing similar tests for optional<T&> that were commented out like this (albeit those were parts of a greater test), but that was likely 8 years ago...so policy regarding stuff like this is most likely different now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we don't keep unused code because of "just in case" mostly. At the very least you need to add // TODO/FIXME and I think it's better to comment it out instead of #if false/0, because it is far more visible.
| } | ||
| } | ||
|
|
||
| constexpr bool tests() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| constexpr bool tests() { | |
| constexpr bool test() { |
Nit: The top-level test function is normally just "test()". Can you use consistently: "test()". I think it is sometimes test(); and sometimes tests();
libcxx/test/libcxx/utilities/text_encoding/text_encoding.members/nodiscard.verify.cpp
Show resolved
Hide resolved
|
@H-G-Hristov Can you please use the "Start a review" feature to avoid spamming E-Mails? |
Partially resolves #105373 and resolves #118371
<text_encoding>locale::encoding()is not implemented in this PR due tolocalebeing (mostly) implemented in a source file, and at the momenttext_encodingis unable to be exposed as a symbol in the library due to it being built with C++23.