java lucene词干提取_lucene 进行去除停用词和词干提取
时间: 2024-01-15 14:03:04 浏览: 136
您好!针对您的问题,可以使用 Lucene 的分词器和分析器来进行去除停用词和词干提取。以下是一个简单的示例:
```java
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.core.StopAnalyzer;
import org.apache.lucene.analysis.en.PorterStemFilter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.util.Version;
import java.io.IOException;
import java.io.StringReader;
public class LuceneStemmingExample {
public static void main(String[] args) throws IOException {
String text = "Lucene is a Java full-text search engine. " +
"Lucene is not a complete application, but rather a code library " +
"and lucene is used to add search capability to applications.";
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_7_7_0);
analyzer = new StopAnalyzer(Version.LUCENE_7_7_0); // 添加停用词
analyzer = new PorterStemFilter(analyzer); // 添加词干提取器
StringReader reader = new StringReader(text);
TokenStream tokenStream = analyzer.tokenStream("", reader);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
System.out.println(charTermAttribute.toString());
}
tokenStream.end();
tokenStream.close();
}
}
```
在上面的示例中,我们使用了 Lucene 的 `StandardAnalyzer` 分析器来对文本进行分词,然后使用了 `StopAnalyzer` 停用词分析器来去除停用词,最后使用了 `PorterStemFilter` 词干提取器来对词进行词干提取。最终输出的结果如下:
```
lucene
java
full
text
search
engine
lucene
complete
application
rather
code
library
lucene
used
add
search
capability
applications
```
可以看到,输出的词已经被去除了停用词并被进行了词干提取。希望这个示例对您有所帮助!
阅读全文