首页编译原理用c++写一个简单分词器能够读入并且输出文档

编译原理用c++写一个简单分词器能够读入并且输出文档

时间: 2024-11-06 18:22:05 浏览: 35

编译原理中的分词器通常涉及到字符串处理、正则表达式等技术。这里是一个简单的C++分词器示例，它将读取文本文件并按空格分割成单词： ```cpp #include <iostream> #include <fstream> #include <sstream> #include <vector> #include <string> std::vector<std::string> tokenize(const std::string& text) { std::istringstream iss(text); std::vector<std::string> tokens; std::string token; while (iss >> token) { if (!token.empty()) { // 防止因空格后的换行符导致的空token tokens.push_back(token); } } return tokens; } int main() { std::ifstream inputFile("input.txt"); // 假设输入文件名为"input.txt" if (!inputFile.is_open()) { std::cerr << "Failed to open input file." << std::endl; return 1; } std::string content((std::istreambuf_iterator<char>(inputFile)), {}); inputFile.close(); std::vector<std::string> words = tokenize(content); for (const auto& word : words) { std::cout << word << " "; } std::cout << std::endl; // 输出分词结果 return 0; } ``` 这个程序首先打开一个文本文件（在这里假设名字为`input.txt`），然后逐行读取内容到一个字符串中。接着，`tokenize`函数通过`std::istringstream`对字符串进行空格分割，生成一个包含单词的向量。最后，我们遍历并打印出分词结果。注意，这只是一个基础版本的分词器，实际应用中可能需要考虑更复杂的因素，如停用词过滤、标点符号去除、词形还原等。

阅读全文