Image by xresch from Pixabay

Elasticsearch-01

simple install and query with java

Posted by Yvan on July 30, 2022

1 Install with Docker-Compose

1 安装docker,
2 在项目根目录下建立 docker-compose.yml 文件
  其中参数
  -ES_JAVA_OPTS 用于限制内存大小,
  -bootstrap.memory_lock 是否允许节点内存交换,锁住内存地址可以防止硬盘频繁读取,
  -xpack.security.enabled 用于保护数据,有多个不同权限账户,账户需设置密码,未设置时是默认密码,

1
2
3
4
5
6
7
8
9
10
11
12
13
services:
  myelastic:
    image: elasticsearch:8.3.2
    container_name: myelastic
    environment:
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.type=single-node"
      - "network.host=0.0.0.0"
      - "xpack.security.enabled=false"
    ports:
      - "9200:9200"
      - "9300:9300"

3 运行ES:在终端中运行 docker-compose up --build --force-recreate
--build 重制 image,
--force-recreate 重做 container
保证更改被应用,同时每次开启后的ES都是新的空的(如果要保留每次运行的数据,则不要 –force-recreate)
4 成功启动后,可以访问/curl http://localhost:9200/ 会得到json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{ 
  "name" : "831d47aa0e5f",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "NUSoyfauSseaNyz0iFD-NQ",
  "version" : {
    "number" : "8.3.2",
    "build_type" : "docker",
    "build_hash" : "8b0b1f23fbebecc3c88e4464319dea8989f374fd",
    "build_date" : "2022-07-06T15:15:15.901688194Z",
    "build_snapshot" : false,
    "lucene_version" : "9.2.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

5 可能的问题:
  virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] 解决方法

2 Query with Java

这里先跳过往Elasticsearch里放入数据的indexing部分,
假设现在有一个叫做lookup的索引(index),其有label,synonyms,acronyms,broaders四个fields

1 终端 curl 格式
搜索和”Natural History”有关的数据

1
2
3
4
5
6
7
8
9
10
curl -X POST http://localhost:9200/lookup/_search
     -H "Content-Type: application/json"
     -d {"query": {
         "query_string": {
            "query": "Natural History"
         }},
         "size": 100,
         "from": 0,
         "sort": []
        }

-d 的json部分可以替换为其他各种类型的search格式,它和java中的相同,以下列举一些常用格式以及部分参数
  - match query
 特点:对单个filed搜索。

1
2
3
4
5
6
{"query": {
    "match": {
      "label": {                        # fieldname
        "query": "Natural History",     # keyword
        "fuzziness": "AUTO"             # fuzziness 0/1/2/AUTO
}}}}

  - multi_match query
 特点:对多个fileds搜索。

1
2
3
4
5
6
{"query": {
    "multi_match" : {
      "query" : "this is a test",
      "fields" : ["subject^3", "message" ], # fieldname list, 可设置不同的权重 ^2 ^3 ^4 ...
      "fuzziness": "AUTO"
}}}

  - query_string query
 特点:可以使用Query String Syntax语法, 如 “(new york city) OR (big apple)”。若无需求,请使用match或者multi-match

1
2
3
4
5
6
{"query": {
    "query_string": {
        # 模糊搜索时,需要在String后加上"~" 如: quikc~ brwn~ foks~
        "query": "(new york city) OR (big apple)",
        "fields": ["label^2", "synonyms", "acronyms", "broaders"]
}}}

2 Java中使用时的负载格式

1
2
3
4
5
6
7
8
9
10
11
// 为使json格式易于观察,使用"+"使String分行
String query_multimatch
        = "{\"query\": {" +
                "\"multi_match\": {" +
                    "\"query\": \""+ searchText +"\"," +
                    "\"fields\": [\"label^2\", \"synonyms\", \"acronyms\", \"broaders\"]," +
                    "\"fuzziness\": \"AUTO\"" +
                "}}," +
            "\"size\": 10," +
            "\"from\": 0," +
            "\"sort\": []}";

3 使用Java curl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
//esUrl = esHost + indexname + _search
String esUrl = "http://localhost:9200/lookup/_search"; 
String searchText = "Quantum";

// establish connection 
URL obj = new URL(esUrl);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();

// set connection
con.setRequestMethod("POST");
con.setDoOutput(true);
con.setRequestProperty("Content-Type", "application/json");

// set query string
String query_multimatch
        = "{\"query\": {" +
                "\"multi_match\": {" +
                    "\"query\": \""+ searchText +"\"," +
                    "\"fields\": [\"label^2\", \"synonyms\", \"acronyms\", \"broaders\"]," +
                    "\"fuzziness\": \"AUTO\"" +
                "}}," +
            "\"size\": 10," +
            "\"from\": 0," +
            "\"sort\": []}";

// query and get response
byte[] out = query_multimatch.getBytes(StandardCharsets.UTF_8);
OutputStream stream = con.getOutputStream();
stream.write(out);
BufferedReader iny = new BufferedReader(new InputStreamReader(con.getInputStream()));
String output;
StringBuffer response = new StringBuffer();
while ((output = iny.readLine()) != null) {response.append(output);}
iny.close();
con.disconnect();

// get JSON
JSONObject responseJason = new JSONObject(response.toString());
JSONArray results = responseJason.getJSONObject("hits").getJSONArray("hits");