- Notifications
You must be signed in to change notification settings - Fork 14.9k
[C++20][Modules] Implement P1857R3 Modules Dependency Discovery #107168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
…tructures to `IdentifierLoc` (#135808) I found this issue when I working on #107168. Currently we have many similiar data structures like: - `std::pair<IdentifierInfo *, SourceLocation>`. - Element type of `ModuleIdPath`. - `IdentifierLocPair`. - `IdentifierLoc`. This PR unify these data structures to `IdentifierLoc`, moved `IdentifierLoc` definition to SourceLocation.h, and deleted other similer data structures. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
…like data structures to `IdentifierLoc` (#135808) I found this issue when I working on llvm/llvm-project#107168. Currently we have many similiar data structures like: - `std::pair<IdentifierInfo *, SourceLocation>`. - Element type of `ModuleIdPath`. - `IdentifierLocPair`. - `IdentifierLoc`. This PR unify these data structures to `IdentifierLoc`, moved `IdentifierLoc` definition to SourceLocation.h, and deleted other similer data structures. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
… data structures to `IdentifierLoc` (#136077) This PR reland #135808, fixed some missed changes in LLDB. I found this issue when I working on #107168. Currently we have many similiar data structures like: - std::pair<IdentifierInfo *, SourceLocation>. - Element type of ModuleIdPath. - IdentifierLocPair. - IdentifierLoc. This PR unify these data structures to IdentifierLoc, moved IdentifierLoc definition to SourceLocation.h, and deleted other similer data structures. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
…` pair-like data structures to `IdentifierLoc` (#136077) This PR reland llvm/llvm-project#135808, fixed some missed changes in LLDB. I found this issue when I working on llvm/llvm-project#107168. Currently we have many similiar data structures like: - std::pair<IdentifierInfo *, SourceLocation>. - Element type of ModuleIdPath. - IdentifierLocPair. - IdentifierLoc. This PR unify these data structures to IdentifierLoc, moved IdentifierLoc definition to SourceLocation.h, and deleted other similer data structures. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
… data structures to `IdentifierLoc` (llvm#136077) This PR reland llvm#135808, fixed some missed changes in LLDB. I found this issue when I working on llvm#107168. Currently we have many similiar data structures like: - std::pair<IdentifierInfo *, SourceLocation>. - Element type of ModuleIdPath. - IdentifierLocPair. - IdentifierLoc. This PR unify these data structures to IdentifierLoc, moved IdentifierLoc definition to SourceLocation.h, and deleted other similer data structures. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
dbc377e
to d300a2b
Compare @llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang-modules Author: None (yronglin) ChangesImplement P1857R3 Modules Dependency Discovery.
At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:
Otherwise the token is treated as an identifier. Additionally:
Need add more test Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff 43 Files Affected:
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp index d872020c2d8a3..22a3eb97f938b 100644 --- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp +++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp @@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler { Token Tok; PP.LexUnexpandedToken(Tok); if (Tok.isNot(tok::eod)) - PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma"; + PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma"; if (HandledDecl) { DiagnosticsEngine &D = PP.getDiagnostics(); diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td index 723f5d48b4f5f..f975a63b369b5 100644 --- a/clang/include/clang/Basic/DiagnosticLexKinds.td +++ b/clang/include/clang/Basic/DiagnosticLexKinds.td @@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error< def ext_pp_extra_tokens_at_eol : ExtWarn< "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>; +def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn< + "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>; def ext_pp_comma_expr : Extension<"comma operator in operand of #if">; def ext_pp_bad_vaargs_use : Extension< @@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning< def ext_named_variadic_macro : Extension< "named variadic macros are a GNU extension">, InGroup<VariadicMacros>; def err_embedded_directive : Error< - "embedding a #%0 directive within macro arguments is not supported">; + "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">; def ext_embedded_directive : Extension< "embedding a directive within macro arguments has undefined behavior">, InGroup<DiagGroup<"embedded-directive">>; @@ -983,6 +985,18 @@ def warn_module_conflict : Warning< InGroup<ModuleConflict>; // C++20 modules +def err_pp_expected_module_name_or_header_name : Error< + "expected module name or header name">; +def err_pp_expected_semi_after_module_or_import : Error< + "'%select{module|import}0' directive must end with a ';' on the same line">; +def err_module_decl_in_header : Error< + "module declaration must not come from an #include directive">; +def err_pp_cond_span_module_decl : Error< + "preprocessor conditionals shall not span a module declaration">; +def err_pp_module_expected_ident : Error< + "expected a module name after '%select{module|import}0'">; +def err_pp_unsupported_module_partition : Error< + "module partitions are only supported for C++20 onwards">; def err_header_import_semi_in_macro : Error< "semicolon terminating header import declaration cannot be produced " "by a macro">; diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td index 3aa36ad59d0b9..c06e2f090b429 100644 --- a/clang/include/clang/Basic/DiagnosticParseKinds.td +++ b/clang/include/clang/Basic/DiagnosticParseKinds.td @@ -1760,8 +1760,8 @@ def ext_bit_int : Extension< } // end of Parse Issue category. let CategoryName = "Modules Issue" in { -def err_unexpected_module_decl : Error< - "module declaration can only appear at the top level">; +def err_unexpected_module_import_decl : Error< + "%select{module|import}0 declaration can only appear at the top level">; def err_module_expected_ident : Error< "expected a module name after '%select{module|import}0'">; def err_attribute_not_module_attr : Error< @@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error< def err_private_module_fragment_expected_semi : Error< "expected ';' after private module fragment declaration">; def err_missing_before_module_end : Error<"expected %0 at end of module">; -def err_unsupported_module_partition : Error< - "module partitions are only supported for C++20 onwards">; def err_import_not_allowed_here : Error< "imports must immediately follow the module declaration">; def err_partition_import_outside_module : Error< diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h index 54540193cfcc0..add6c6ac629a1 100644 --- a/clang/include/clang/Basic/IdentifierTable.h +++ b/clang/include/clang/Basic/IdentifierTable.h @@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo { LLVM_PREFERRED_TYPE(bool) unsigned IsModulesImport : 1; + // True if this is the 'module' contextual keyword. + LLVM_PREFERRED_TYPE(bool) + unsigned IsModulesDecl : 1; + // True if this is a mangled OpenMP variant name. LLVM_PREFERRED_TYPE(bool) unsigned IsMangledOpenMPVariantName : 1; @@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo { IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false), IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false), RevertedTokenID(false), OutOfDate(false), IsModulesImport(false), - IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false), - IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {} + IsModulesDecl(false), IsMangledOpenMPVariantName(false), + IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false), + IsKeywordInCpp(false) {} public: IdentifierInfo(const IdentifierInfo &) = delete; @@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo { RecomputeNeedsHandleIdentifier(); } + /// Determine whether this is the contextual keyword \c module. + bool isModulesDeclaration() const { return IsModulesDecl; } + + /// Set whether this identifier is the contextual keyword \c module. + void setModulesDeclaration(bool I) { + IsModulesDecl = I; + if (I) + NeedsHandleIdentifier = true; + else + RecomputeNeedsHandleIdentifier(); + } + /// Determine whether this is the mangled name of an OpenMP variant. bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; } @@ -745,10 +762,11 @@ class IdentifierTable { // contents. II->Entry = &Entry; - // If this is the 'import' contextual keyword, mark it as such. + // If this is the 'import' or 'module' contextual keyword, mark it as such. if (Name == "import") II->setModulesImport(true); - + else if (Name == "module") + II->setModulesDeclaration(true); return *II; } diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def index 94e72fea56a68..7750c84dbef78 100644 --- a/clang/include/clang/Basic/TokenKinds.def +++ b/clang/include/clang/Basic/TokenKinds.def @@ -133,6 +133,9 @@ PPKEYWORD(pragma) // C23 & C++26 #embed PPKEYWORD(embed) +// C++20 Module Directive +PPKEYWORD(module) + // GNU Extensions. PPKEYWORD(import) PPKEYWORD(include_next) @@ -1023,6 +1026,9 @@ ANNOTATION(module_include) ANNOTATION(module_begin) ANNOTATION(module_end) +// Annotations for C++, Clang and Objective-C named modules. +ANNOTATION(module_name) + // Annotation for a header_name token that has been looked up and transformed // into the name of a header unit. ANNOTATION(header_unit) diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h index 0ae490f0e8073..112d3b00160fd 100644 --- a/clang/include/clang/Frontend/CompilerInstance.h +++ b/clang/include/clang/Frontend/CompilerInstance.h @@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader { /// load it. ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName, SourceLocation ImportLoc, - SourceLocation ModuleNameLoc, + SourceRange ModuleNameRange, bool IsInclusionDirective); /// Creates a \c CompilerInstance for compiling a module. diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h index bd3e05a36bb33..2ef29743415ae 100644 --- a/clang/include/clang/Lex/CodeCompletionHandler.h +++ b/clang/include/clang/Lex/CodeCompletionHandler.h @@ -13,12 +13,15 @@ #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H +#include "clang/Basic/IdentifierTable.h" +#include "clang/Basic/SourceLocation.h" #include "llvm/ADT/StringRef.h" namespace clang { class IdentifierInfo; class MacroInfo; +using ModuleIdPath = ArrayRef<IdentifierLoc>; /// Callback handler that receives notifications when performing code /// completion within the preprocessor. @@ -70,6 +73,11 @@ class CodeCompletionHandler { /// file where we expect natural language, e.g., a comment, string, or /// \#error directive. virtual void CodeCompleteNaturalLanguage() { } + + /// Callback invoked when performing code completion inside the module name + /// part of an import directive. + virtual void CodeCompleteModuleImport(SourceLocation ImportLoc, + ModuleIdPath Path) {} }; } diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h index bb65ae010cffa..a595cda1eaa77 100644 --- a/clang/include/clang/Lex/Lexer.h +++ b/clang/include/clang/Lex/Lexer.h @@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer { //===--------------------------------------------------------------------===// // Context that changes as the file is lexed. // NOTE: any state that mutates when in raw mode must have save/restore code - // in Lexer::isNextPPTokenLParen. + // in Lexer::peekNextPPToken. // BufferPtr - Current pointer into the buffer. This is the next character // to be lexed. @@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer { BufferPtr = TokEnd; } - /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a - /// tok::l_paren token, 0 if it is something else and 2 if there are no more - /// tokens in the buffer controlled by this lexer. - unsigned isNextPPTokenLParen(); + /// peekNextPPToken - Return std::nullopt if there are no more tokens in the + /// buffer controlled by this lexer, otherwise return the next unexpanded + /// token. + std::optional<Token> peekNextPPToken(); //===--------------------------------------------------------------------===// // Lexer character reading interfaces. diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h index f2dfd3a349b8b..79a75a116c418 100644 --- a/clang/include/clang/Lex/Preprocessor.h +++ b/clang/include/clang/Lex/Preprocessor.h @@ -48,6 +48,7 @@ #include "llvm/Support/Allocator.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Registry.h" +#include "llvm/Support/TrailingObjects.h" #include <cassert> #include <cstddef> #include <cstdint> @@ -82,6 +83,7 @@ class PreprocessorLexer; class PreprocessorOptions; class ScratchBuffer; class TargetInfo; +class ModuleNameLoc; namespace Builtin { class Context; @@ -332,8 +334,9 @@ class Preprocessor { /// lexed, if any. SourceLocation ModuleImportLoc; - /// The import path for named module that we're currently processing. - SmallVector<IdentifierLoc, 2> NamedModuleImportPath; + /// The source location of the \c module contextual keyword we just + /// lexed, if any. + SourceLocation ModuleDeclLoc; llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints; unsigned CheckPointCounter = 0; @@ -344,6 +347,21 @@ class Preprocessor { /// Whether the last token we lexed was an '@'. bool LastTokenWasAt = false; + /// Whether we're importing a standard C++20 named Modules. + bool ImportingCXXNamedModules = false; + + /// Whether we're declaring a standard C++20 named Modules. + bool DeclaringCXXNamedModules = false; + + struct ExportContextualKeywordInfo { + Token ExportTok; + bool TokAtPhysicalStartOfLine; + }; + + /// Whether the last token we lexed was an 'export' keyword. + std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword = + std::nullopt; + /// A position within a C++20 import-seq. class StdCXXImportSeq { public: @@ -547,12 +565,7 @@ class Preprocessor { reset(); } - void handleIdentifier(IdentifierInfo *Identifier) { - if (isModuleCandidate() && Identifier) - Name += Identifier->getName().str(); - else if (!isNamedModule()) - reset(); - } + void handleModuleName(ModuleNameLoc *Path); void handleColon() { if (isModuleCandidate()) @@ -561,13 +574,6 @@ class Preprocessor { reset(); } - void handlePeriod() { - if (isModuleCandidate()) - Name += "."; - else if (!isNamedModule()) - reset(); - } - void handleSemi() { if (!Name.empty() && isModuleCandidate()) { if (State == InterfaceCandidate) @@ -622,10 +628,6 @@ class Preprocessor { ModuleDeclSeq ModuleDeclState; - /// Whether the module import expects an identifier next. Otherwise, - /// it expects a '.' or ';'. - bool ModuleImportExpectsIdentifier = false; - /// The identifier and source location of the currently-active /// \#pragma clang arc_cf_code_audited begin. IdentifierLoc PragmaARCCFCodeAuditedInfo; @@ -1759,6 +1761,19 @@ class Preprocessor { /// Lex the parameters for an #embed directive, returns nullopt on error. std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current, bool ForHasEmbed); + bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc, + SmallVectorImpl<IdentifierLoc> &Path, + bool AllowMacroExpansion = true); + void HandleCXXImportDirective(Token Import); + void HandleCXXModuleDirective(Token Module); + + /// Callback invoked when the lexer sees one of export, import or module token + /// at the start of a line. + /// + /// This consumes the import, module directive, modifies the + /// lexer/preprocessor state, and advances the lexer(s) so that the next token + /// read is the correct one. + bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine); bool LexAfterModuleImport(Token &Result); void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks); @@ -2282,7 +2297,9 @@ class Preprocessor { /// Determine whether the next preprocessor token to be /// lexed is a '('. If so, consume the token and return true, if not, this /// method should have no observable side-effect on the lexed tokens. - bool isNextPPTokenLParen(); + bool isNextPPTokenLParen() { + return peekNextPPToken().value_or(Token{}).is(tok::l_paren); + } private: /// Identifiers used for SEH handling in Borland. These are only @@ -2342,7 +2359,7 @@ class Preprocessor { /// /// \return The location of the end of the directive (the terminating /// newline). - SourceLocation CheckEndOfDirective(const char *DirType, + SourceLocation CheckEndOfDirective(StringRef DirType, bool EnableMacros = false); /// Read and discard all tokens remaining on the current line until @@ -2424,11 +2441,12 @@ class Preprocessor { } /// If we're importing a standard C++20 Named Modules. - bool isInImportingCXXNamedModules() const { - // NamedModuleImportPath will be non-empty only if we're importing - // Standard C++ named modules. - return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules && - !IsAtImport; + bool isImportingCXXNamedModules() const { + return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules; + } + + bool isDeclaringCXXNamedModules() const { + return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules; } /// Allocate a new MacroInfo object with the provided SourceLocation. @@ -2661,6 +2679,10 @@ class Preprocessor { void removeCachedMacroExpandedTokensOfLastLexer(); + /// Peek the next token. If so, return the token, if not, this + /// method should have no observable side-effect on the lexed tokens. + std::optional<Token> peekNextPPToken(); + /// After reading "MACRO(", this method is invoked to read all of the formal /// arguments specified for the macro invocation. Returns null on error. MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI, @@ -3078,6 +3100,53 @@ struct EmbedAnnotationData { StringRef FileName; }; +/// Represents module name annotation data. +/// +/// module-name: +/// module-name-qualifier[opt] identifier +/// +/// partition-name: [C++20] +/// : module-name-qualifier[opt] identifier +/// +/// module-name-qualifier +/// module-name-qualifier[opt] identifier . +class ModuleNameLoc final + : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> { + friend TrailingObjects; + unsigned NumIdentifierLocs; + + unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const { + return getNumIdentifierLocs(); + } + + ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) { + (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>()); + } + +public: + static std::string stringFromModuleIdPath(ModuleIdPath Path); + static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path); + static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path); + unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; } + ModuleIdPath getModuleIdPath() const { + return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()}; + } + + SourceLocation getBeginLoc() const { + return getModuleIdPath().front().getLoc(); + } + SourceLocation getEndLoc() const { + auto &Last = getModuleIdPath().back(); + return Last.getLoc().getLocWithOffset( + Last.getIdentifierInfo()->getLength()); + } + SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; } + + std::string str() const; + void print(llvm::raw_ostream &OS) const; + void dump() const { print(llvm::errs()); } +}; + /// Registry of pragma handlers added by plugins using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>; diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h index 4f29fb7d11415..8e81207ddf8d7 100644 --- a/clang/include/clang/Lex/Token.h +++ b/clang/include/clang/Lex/Token.h @@ -231,6 +231,9 @@ class Token { PtrData = const_cast<char*>(Ptr); } + template <class T> T getAnnotationValueAs() const { + return static_cast<T>(getAnnotationValue()); + } void *getAnnotationValue() const { assert(isAnnotation() && "Used AnnotVal on non-annotation token"); return PtrData; @@ -289,6 +292,10 @@ class Token { /// Return the ObjC keyword kind. tok::ObjCKeywordKind getObjCKeywordID() const; + /// Return true if we have an C++20 Modules contextual keyword(export, import + /// or module). + bool isModuleContextualKeyword(bool AllowExport = true) const; + bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const; /// Return true if this token has trigraphs or escaped newlines in it. diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h index 4d229ae610674..777b4e6266c71 100644 --- a/clang/include/clang/Lex/TokenLexer.h +++ b/clang/include/clang/Lex/TokenLexer.h @@ -139,10 +139,9 @@ class TokenLexer { void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion, bool OwnsTokens, bool IsReinject); - /// If the next token lexed will pop this macro off the - /// expansion stack, return 2. If the next unexpanded token is a '(', return - /// 1, otherwise return 0. - unsigned isNextTokenLParen() const; + /// If the next token lexed will pop this macro off the expansion stack, + /// return std::nullopt, otherwise return the next unexpanded token. + std::optional<Token> peekNextPPToken() const; /// Lex and return a token from this macro stream. bool Lex(Token &Tok); diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h index c4bef4729fd36..a59a99bbac7c6 100644 --- a/clang/include/clang/Parse/Parser.h +++ b/clang/include/clang/Parse/Parser.h @@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler { unsigned ArgumentIndex) override; ... [truncated] |
@llvm/pr-subscribers-clang Author: None (yronglin) ChangesImplement P1857R3 Modules Dependency Discovery.
At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:
Otherwise the token is treated as an identifier. Additionally:
Need add more test Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff 43 Files Affected:
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp index d872020c2d8a3..22a3eb97f938b 100644 --- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp +++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp @@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler { Token Tok; PP.LexUnexpandedToken(Tok); if (Tok.isNot(tok::eod)) - PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma"; + PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma"; if (HandledDecl) { DiagnosticsEngine &D = PP.getDiagnostics(); diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td index 723f5d48b4f5f..f975a63b369b5 100644 --- a/clang/include/clang/Basic/DiagnosticLexKinds.td +++ b/clang/include/clang/Basic/DiagnosticLexKinds.td @@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error< def ext_pp_extra_tokens_at_eol : ExtWarn< "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>; +def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn< + "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>; def ext_pp_comma_expr : Extension<"comma operator in operand of #if">; def ext_pp_bad_vaargs_use : Extension< @@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning< def ext_named_variadic_macro : Extension< "named variadic macros are a GNU extension">, InGroup<VariadicMacros>; def err_embedded_directive : Error< - "embedding a #%0 directive within macro arguments is not supported">; + "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">; def ext_embedded_directive : Extension< "embedding a directive within macro arguments has undefined behavior">, InGroup<DiagGroup<"embedded-directive">>; @@ -983,6 +985,18 @@ def warn_module_conflict : Warning< InGroup<ModuleConflict>; // C++20 modules +def err_pp_expected_module_name_or_header_name : Error< + "expected module name or header name">; +def err_pp_expected_semi_after_module_or_import : Error< + "'%select{module|import}0' directive must end with a ';' on the same line">; +def err_module_decl_in_header : Error< + "module declaration must not come from an #include directive">; +def err_pp_cond_span_module_decl : Error< + "preprocessor conditionals shall not span a module declaration">; +def err_pp_module_expected_ident : Error< + "expected a module name after '%select{module|import}0'">; +def err_pp_unsupported_module_partition : Error< + "module partitions are only supported for C++20 onwards">; def err_header_import_semi_in_macro : Error< "semicolon terminating header import declaration cannot be produced " "by a macro">; diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td index 3aa36ad59d0b9..c06e2f090b429 100644 --- a/clang/include/clang/Basic/DiagnosticParseKinds.td +++ b/clang/include/clang/Basic/DiagnosticParseKinds.td @@ -1760,8 +1760,8 @@ def ext_bit_int : Extension< } // end of Parse Issue category. let CategoryName = "Modules Issue" in { -def err_unexpected_module_decl : Error< - "module declaration can only appear at the top level">; +def err_unexpected_module_import_decl : Error< + "%select{module|import}0 declaration can only appear at the top level">; def err_module_expected_ident : Error< "expected a module name after '%select{module|import}0'">; def err_attribute_not_module_attr : Error< @@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error< def err_private_module_fragment_expected_semi : Error< "expected ';' after private module fragment declaration">; def err_missing_before_module_end : Error<"expected %0 at end of module">; -def err_unsupported_module_partition : Error< - "module partitions are only supported for C++20 onwards">; def err_import_not_allowed_here : Error< "imports must immediately follow the module declaration">; def err_partition_import_outside_module : Error< diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h index 54540193cfcc0..add6c6ac629a1 100644 --- a/clang/include/clang/Basic/IdentifierTable.h +++ b/clang/include/clang/Basic/IdentifierTable.h @@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo { LLVM_PREFERRED_TYPE(bool) unsigned IsModulesImport : 1; + // True if this is the 'module' contextual keyword. + LLVM_PREFERRED_TYPE(bool) + unsigned IsModulesDecl : 1; + // True if this is a mangled OpenMP variant name. LLVM_PREFERRED_TYPE(bool) unsigned IsMangledOpenMPVariantName : 1; @@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo { IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false), IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false), RevertedTokenID(false), OutOfDate(false), IsModulesImport(false), - IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false), - IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {} + IsModulesDecl(false), IsMangledOpenMPVariantName(false), + IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false), + IsKeywordInCpp(false) {} public: IdentifierInfo(const IdentifierInfo &) = delete; @@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo { RecomputeNeedsHandleIdentifier(); } + /// Determine whether this is the contextual keyword \c module. + bool isModulesDeclaration() const { return IsModulesDecl; } + + /// Set whether this identifier is the contextual keyword \c module. + void setModulesDeclaration(bool I) { + IsModulesDecl = I; + if (I) + NeedsHandleIdentifier = true; + else + RecomputeNeedsHandleIdentifier(); + } + /// Determine whether this is the mangled name of an OpenMP variant. bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; } @@ -745,10 +762,11 @@ class IdentifierTable { // contents. II->Entry = &Entry; - // If this is the 'import' contextual keyword, mark it as such. + // If this is the 'import' or 'module' contextual keyword, mark it as such. if (Name == "import") II->setModulesImport(true); - + else if (Name == "module") + II->setModulesDeclaration(true); return *II; } diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def index 94e72fea56a68..7750c84dbef78 100644 --- a/clang/include/clang/Basic/TokenKinds.def +++ b/clang/include/clang/Basic/TokenKinds.def @@ -133,6 +133,9 @@ PPKEYWORD(pragma) // C23 & C++26 #embed PPKEYWORD(embed) +// C++20 Module Directive +PPKEYWORD(module) + // GNU Extensions. PPKEYWORD(import) PPKEYWORD(include_next) @@ -1023,6 +1026,9 @@ ANNOTATION(module_include) ANNOTATION(module_begin) ANNOTATION(module_end) +// Annotations for C++, Clang and Objective-C named modules. +ANNOTATION(module_name) + // Annotation for a header_name token that has been looked up and transformed // into the name of a header unit. ANNOTATION(header_unit) diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h index 0ae490f0e8073..112d3b00160fd 100644 --- a/clang/include/clang/Frontend/CompilerInstance.h +++ b/clang/include/clang/Frontend/CompilerInstance.h @@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader { /// load it. ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName, SourceLocation ImportLoc, - SourceLocation ModuleNameLoc, + SourceRange ModuleNameRange, bool IsInclusionDirective); /// Creates a \c CompilerInstance for compiling a module. diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h index bd3e05a36bb33..2ef29743415ae 100644 --- a/clang/include/clang/Lex/CodeCompletionHandler.h +++ b/clang/include/clang/Lex/CodeCompletionHandler.h @@ -13,12 +13,15 @@ #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H +#include "clang/Basic/IdentifierTable.h" +#include "clang/Basic/SourceLocation.h" #include "llvm/ADT/StringRef.h" namespace clang { class IdentifierInfo; class MacroInfo; +using ModuleIdPath = ArrayRef<IdentifierLoc>; /// Callback handler that receives notifications when performing code /// completion within the preprocessor. @@ -70,6 +73,11 @@ class CodeCompletionHandler { /// file where we expect natural language, e.g., a comment, string, or /// \#error directive. virtual void CodeCompleteNaturalLanguage() { } + + /// Callback invoked when performing code completion inside the module name + /// part of an import directive. + virtual void CodeCompleteModuleImport(SourceLocation ImportLoc, + ModuleIdPath Path) {} }; } diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h index bb65ae010cffa..a595cda1eaa77 100644 --- a/clang/include/clang/Lex/Lexer.h +++ b/clang/include/clang/Lex/Lexer.h @@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer { //===--------------------------------------------------------------------===// // Context that changes as the file is lexed. // NOTE: any state that mutates when in raw mode must have save/restore code - // in Lexer::isNextPPTokenLParen. + // in Lexer::peekNextPPToken. // BufferPtr - Current pointer into the buffer. This is the next character // to be lexed. @@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer { BufferPtr = TokEnd; } - /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a - /// tok::l_paren token, 0 if it is something else and 2 if there are no more - /// tokens in the buffer controlled by this lexer. - unsigned isNextPPTokenLParen(); + /// peekNextPPToken - Return std::nullopt if there are no more tokens in the + /// buffer controlled by this lexer, otherwise return the next unexpanded + /// token. + std::optional<Token> peekNextPPToken(); //===--------------------------------------------------------------------===// // Lexer character reading interfaces. diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h index f2dfd3a349b8b..79a75a116c418 100644 --- a/clang/include/clang/Lex/Preprocessor.h +++ b/clang/include/clang/Lex/Preprocessor.h @@ -48,6 +48,7 @@ #include "llvm/Support/Allocator.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Registry.h" +#include "llvm/Support/TrailingObjects.h" #include <cassert> #include <cstddef> #include <cstdint> @@ -82,6 +83,7 @@ class PreprocessorLexer; class PreprocessorOptions; class ScratchBuffer; class TargetInfo; +class ModuleNameLoc; namespace Builtin { class Context; @@ -332,8 +334,9 @@ class Preprocessor { /// lexed, if any. SourceLocation ModuleImportLoc; - /// The import path for named module that we're currently processing. - SmallVector<IdentifierLoc, 2> NamedModuleImportPath; + /// The source location of the \c module contextual keyword we just + /// lexed, if any. + SourceLocation ModuleDeclLoc; llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints; unsigned CheckPointCounter = 0; @@ -344,6 +347,21 @@ class Preprocessor { /// Whether the last token we lexed was an '@'. bool LastTokenWasAt = false; + /// Whether we're importing a standard C++20 named Modules. + bool ImportingCXXNamedModules = false; + + /// Whether we're declaring a standard C++20 named Modules. + bool DeclaringCXXNamedModules = false; + + struct ExportContextualKeywordInfo { + Token ExportTok; + bool TokAtPhysicalStartOfLine; + }; + + /// Whether the last token we lexed was an 'export' keyword. + std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword = + std::nullopt; + /// A position within a C++20 import-seq. class StdCXXImportSeq { public: @@ -547,12 +565,7 @@ class Preprocessor { reset(); } - void handleIdentifier(IdentifierInfo *Identifier) { - if (isModuleCandidate() && Identifier) - Name += Identifier->getName().str(); - else if (!isNamedModule()) - reset(); - } + void handleModuleName(ModuleNameLoc *Path); void handleColon() { if (isModuleCandidate()) @@ -561,13 +574,6 @@ class Preprocessor { reset(); } - void handlePeriod() { - if (isModuleCandidate()) - Name += "."; - else if (!isNamedModule()) - reset(); - } - void handleSemi() { if (!Name.empty() && isModuleCandidate()) { if (State == InterfaceCandidate) @@ -622,10 +628,6 @@ class Preprocessor { ModuleDeclSeq ModuleDeclState; - /// Whether the module import expects an identifier next. Otherwise, - /// it expects a '.' or ';'. - bool ModuleImportExpectsIdentifier = false; - /// The identifier and source location of the currently-active /// \#pragma clang arc_cf_code_audited begin. IdentifierLoc PragmaARCCFCodeAuditedInfo; @@ -1759,6 +1761,19 @@ class Preprocessor { /// Lex the parameters for an #embed directive, returns nullopt on error. std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current, bool ForHasEmbed); + bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc, + SmallVectorImpl<IdentifierLoc> &Path, + bool AllowMacroExpansion = true); + void HandleCXXImportDirective(Token Import); + void HandleCXXModuleDirective(Token Module); + + /// Callback invoked when the lexer sees one of export, import or module token + /// at the start of a line. + /// + /// This consumes the import, module directive, modifies the + /// lexer/preprocessor state, and advances the lexer(s) so that the next token + /// read is the correct one. + bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine); bool LexAfterModuleImport(Token &Result); void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks); @@ -2282,7 +2297,9 @@ class Preprocessor { /// Determine whether the next preprocessor token to be /// lexed is a '('. If so, consume the token and return true, if not, this /// method should have no observable side-effect on the lexed tokens. - bool isNextPPTokenLParen(); + bool isNextPPTokenLParen() { + return peekNextPPToken().value_or(Token{}).is(tok::l_paren); + } private: /// Identifiers used for SEH handling in Borland. These are only @@ -2342,7 +2359,7 @@ class Preprocessor { /// /// \return The location of the end of the directive (the terminating /// newline). - SourceLocation CheckEndOfDirective(const char *DirType, + SourceLocation CheckEndOfDirective(StringRef DirType, bool EnableMacros = false); /// Read and discard all tokens remaining on the current line until @@ -2424,11 +2441,12 @@ class Preprocessor { } /// If we're importing a standard C++20 Named Modules. - bool isInImportingCXXNamedModules() const { - // NamedModuleImportPath will be non-empty only if we're importing - // Standard C++ named modules. - return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules && - !IsAtImport; + bool isImportingCXXNamedModules() const { + return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules; + } + + bool isDeclaringCXXNamedModules() const { + return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules; } /// Allocate a new MacroInfo object with the provided SourceLocation. @@ -2661,6 +2679,10 @@ class Preprocessor { void removeCachedMacroExpandedTokensOfLastLexer(); + /// Peek the next token. If so, return the token, if not, this + /// method should have no observable side-effect on the lexed tokens. + std::optional<Token> peekNextPPToken(); + /// After reading "MACRO(", this method is invoked to read all of the formal /// arguments specified for the macro invocation. Returns null on error. MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI, @@ -3078,6 +3100,53 @@ struct EmbedAnnotationData { StringRef FileName; }; +/// Represents module name annotation data. +/// +/// module-name: +/// module-name-qualifier[opt] identifier +/// +/// partition-name: [C++20] +/// : module-name-qualifier[opt] identifier +/// +/// module-name-qualifier +/// module-name-qualifier[opt] identifier . +class ModuleNameLoc final + : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> { + friend TrailingObjects; + unsigned NumIdentifierLocs; + + unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const { + return getNumIdentifierLocs(); + } + + ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) { + (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>()); + } + +public: + static std::string stringFromModuleIdPath(ModuleIdPath Path); + static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path); + static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path); + unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; } + ModuleIdPath getModuleIdPath() const { + return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()}; + } + + SourceLocation getBeginLoc() const { + return getModuleIdPath().front().getLoc(); + } + SourceLocation getEndLoc() const { + auto &Last = getModuleIdPath().back(); + return Last.getLoc().getLocWithOffset( + Last.getIdentifierInfo()->getLength()); + } + SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; } + + std::string str() const; + void print(llvm::raw_ostream &OS) const; + void dump() const { print(llvm::errs()); } +}; + /// Registry of pragma handlers added by plugins using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>; diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h index 4f29fb7d11415..8e81207ddf8d7 100644 --- a/clang/include/clang/Lex/Token.h +++ b/clang/include/clang/Lex/Token.h @@ -231,6 +231,9 @@ class Token { PtrData = const_cast<char*>(Ptr); } + template <class T> T getAnnotationValueAs() const { + return static_cast<T>(getAnnotationValue()); + } void *getAnnotationValue() const { assert(isAnnotation() && "Used AnnotVal on non-annotation token"); return PtrData; @@ -289,6 +292,10 @@ class Token { /// Return the ObjC keyword kind. tok::ObjCKeywordKind getObjCKeywordID() const; + /// Return true if we have an C++20 Modules contextual keyword(export, import + /// or module). + bool isModuleContextualKeyword(bool AllowExport = true) const; + bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const; /// Return true if this token has trigraphs or escaped newlines in it. diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h index 4d229ae610674..777b4e6266c71 100644 --- a/clang/include/clang/Lex/TokenLexer.h +++ b/clang/include/clang/Lex/TokenLexer.h @@ -139,10 +139,9 @@ class TokenLexer { void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion, bool OwnsTokens, bool IsReinject); - /// If the next token lexed will pop this macro off the - /// expansion stack, return 2. If the next unexpanded token is a '(', return - /// 1, otherwise return 0. - unsigned isNextTokenLParen() const; + /// If the next token lexed will pop this macro off the expansion stack, + /// return std::nullopt, otherwise return the next unexpanded token. + std::optional<Token> peekNextPPToken() const; /// Lex and return a token from this macro stream. bool Lex(Token &Tok); diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h index c4bef4729fd36..a59a99bbac7c6 100644 --- a/clang/include/clang/Parse/Parser.h +++ b/clang/include/clang/Parse/Parser.h @@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler { unsigned ArgumentIndex) override; ... [truncated] |
Signed-off-by: yronglin <yronglin777@gmail.com>
d300a2b
to 04ddbf6
Compare
|
Signed-off-by: yronglin <yronglin777@gmail.com>
Fixed. |
I try to implement the 2nd approach, generate kw_import as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No identifier in the pp-module-name or pp-module-partition shall currently be defined as an object-like macro.
These should error in preprocessing with -Dn=x
:
export module n;
export module m.n;
export module m:n;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where the pp-tokens (if any) shall not begin with a
(
preprocessing token
This should ideally error in preprocessing (as opposed to later).
export module m();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it not only suggests wording changes necessary to achieve the intent of https://wg21.link/p3034, it also has a number of examples demonstrating that the wording allows
export module m; int n;
The above works with Clang before this PR:
$ oldclang -std=c++26 -fsyntax-only -xc++-module -<<<$'export module M; int n;' Return: 0x00:0 Wed Sep 24 19:36:21 2025 EDT
but stops working with the PR in its current form:
$ clang -std=c++26 -fsyntax-only -xc++-module -<<<$'export module M; int n;' <stdin>:1:18: warning: extra tokens at end of 'module' directive [-Wextra-tokens] 1 | export module M; int n; | ^ | // <stdin>:1:24: error: expected unqualified-id 1 | export module M; int n; | ^ 1 warning and 1 error generated. Return: 0x01:1 Wed Sep 24 19:37:13 2025 EDT
It is hard to believe it is by design to allow "export module m; int n;" while we reject others. Is is possible to adjust the wording for it? |
Why is that hard to believe? The feature originally was not envisioned as a preprocessor directive, so the "line" aspect is not strongly enforced. The later adjustments made "minimal" changes, which can be seen as a sign of intent not to affect the validity of existing code if said code was not "problematic" within the scope of the problems that the papers/issues were trying to address. |
Because there's no need to mix a completely new directive with fundamentally different instructions (and parsing) on the same line. All the complexity required to support this (much less the mental stumbling blocks for readers of the code) stand in contrast to the trivial alternative of "add a newline". The putative burden of doing so doesn't even come close to justifying the resulting complexity -- it's entirely self-defeating of the standard to allow this IMO. |
I believe I have explained why this was not standardized as a "new directive" from a design viewpoint. The directives were more a means to an end. |
The design was about what's required to be able to figure stuff out at the start of phase 4. If we really wanted we could fully duplicate the attribute grammar into the preprocessor to be able to fully restrict the line, but these weren't really intended to be preprocessor control lines. That was just a good way to restrict how they could be preprocessed. |
…me line Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
Thanks for the review! I have added examples from CWG2947 into export module M [[ attr1, attr2 ]] ; // OK in CWG2947, but clang reject, because ; not in same line. export module M [[ attr1, attr2 ]] ; // Same as above But this restriction can be easily removed. What do you think? (A question about CWG2947, seems it has not yet been accepted into std proposal) ? |
Signed-off-by: yronglin <yronglin777@gmail.com>
I think we should implement CWG2947 as written and allow that. CWG2947 has not yet been accepted, but I don't believe there will be issues there. |
I think the caveat is "unless the implementation attempt finds problems" (which is all the more reason to try it). The way the Core Issue resolution is worded, implementing it may prove to be difficult (depending on how the existing preprocessor is structured) because, once we interpret it as performing the check only after macros have been replaced, the interpretation starts down the path where the check is performed only after further directives are processed (which actually tracks with the intent that programs that are valid pre-P1857 remain valid unless they cause "dependency discovery" difficulties). For example: ; a.cppm: export module a #include "a.h" Anyhow, even without throwing CWG2947 into the mix, stuff like this is allowed: export module x _Pragma("GCC warning \"Hi\""); |
Signed-off-by: yronglin <yronglin777@gmail.com>
… with this change, we can currectly handle 'export module x BAR;', BAR is a macro expands to '.y' Signed-off-by: yronglin <yronglin777@gmail.com>
The above is working, but the seemingly simpler case is not: export module x; _Pragma("GCC warning \"hi\""); Gives:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs tests for -E
.
The line separation is not getting maintained:
export module m; int x; extern "C++" int *y = &x;
Output:
# 1 "<stdin>" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 665 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "<stdin>" 2 export module m; int x;extern "C++" int *y = &x;
Signed-off-by: yronglin <yronglin777@gmail.com>
I've temporarily added CWG2947 support in the patch. This needs tests for
Hmm yes, this is a bug in current implementation. I'll fix this later. |
def err_pp_module_decl_in_header | ||
: Error<"module declaration must not come from an #include directive">; | ||
def err_pp_cond_span_module_decl | ||
: Error<"preprocessor conditionals shall not span a module declaration">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are both expressions of the fact that a module
directive is neither a control-line nor a text-line. Which is to say that there should be a preprocessor diagnostic (effective with -E
) that triggers whenever a module
directive is encountered where a control-line or a text-line is required.
In particular, such a diagnostic should only refer to "module directive"s and not to "module declaration"s.
A diagnostic is required for the following:
module; #if 0 export module m; #endif // expected-error@-2 {{}} export module m2;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these 2 diagnostic's wording is not correct here, need update module declaration
to module directive
. But in this code snippet, module directive in a skipped conditional block. IIUC, we should not touch the text in skipped block? Did you mean #if 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we should not touch the text in skipped block? Did you mean
#if 1
?
I don't mean #if 1
. There is actually no "skipped block" in the above because the module directive is not a text-line (https://wg21.link/cpp.pre#2) nor anything else that is allowed in a group (https://eel.is/c++draft/cpp.pre#nt:group-part).
"module '%0' conflicts with already-imported module '%1': %2">, | ||
InGroup<ModuleConflict>; | ||
| ||
// C++20 modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the restriction on import
directives in the pp-global-module-fragment has not been implemented?
module; import m2; // expected-error {{}} export module m;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this restriction was not implemented yet. We can implement this in a separate patch. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this restriction was not implemented yet. We can implement this in a separate patch. WDYT?
I think the cxx_status.html
change should be kept as partial in this PR then.
@Bigcheese, your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think partially implemented is good.
def err_pp_module_expected_ident : Error< | ||
"expected %select{identifier after '.' in |}0module name">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .
is technically part of the pp-tokens following the module name in such a case. In other words, this is a special case where we do currently diagnose the ;
/[
restriction on the pp-token following the module name/partition in the preprocessor.
Contrast with the case where the module name is followed by (
and the last component is the name of a function-like macro. In that case, the ;
/[
restriction does not immediately apply (because it applies after macro expansion), but we currently diagnose it in phase 7 using the ;
/[
restriction.
TL;DR: Instead of having this preprocessor diagnostic, there should be preprocessor diagnostics for the disallowed (
case and for the ;
/[
restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM! Since we have alread implement CWG2947 in this patch, I have a question about the following case:
// #define m(x) x export module m (foo);
Should we emit an diagnostic the (
in line 2 in phase 7? We can't seen it in the handling of module directive in phase 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we emit an diagnostic the
(
in line 2 in phase 7?
I think you found a bug in the resolution of CWG2947. The restriction on (
was not extended to tokens from the next line.
As for diagnosing this in phase 7, how is phase 7 supposed to differentiate between the above and
// #define m(x) x // #define LPAREN ( export module m LPAREN foo);
?
For the (
case, is there a way to look ahead to see if the next preprocessing-token (at the start of phase 4) is (
?
Signed-off-by: yronglin <yronglin777@gmail.com>
Fixed. |
def ext_pp_extra_tokens_at_module_directive_eol | ||
: Warning<"extra tokens at end of '%0' directive">, | ||
InGroup<ExtraTokens>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name should probably not start with ext_
.
I'm not sure that having this in the same group as ext_pp_extra_tokens_at_eol
is the best design. It is more "legitimate" to want to suppress this message than the other one.
Suggestion for the wording:
"tokens after semicolon in '%0' directive"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I'll update this.
This PR implement the following papers:
P1857R3 Modules Dependency Discovery.
P3034R1 Module Declarations Shouldn’t be Macros.
CWG2947.
At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:
Otherwise the token is treated as an identifier.
Additionally:
After this patch, we handle C++ module-import and module-declaration as a real pp-directive in preprocessor. Additionally, we refactor module name lexing, remove the complex state machine and read full module name during module/import directive handling. Possibly we can introduce a tok::annot_module_name token in the future, avoid duplicatly parsing module name in both preprocessor and parser, but it's makes error recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword
__preprocessed_module
and__preprocessed_import
. These 2 keyword was generated during-E
mode. This is useful to avoid confusion withmodule
andimport
keyword in preprocessed output:Fixes #54047