将你的 OpenShift Elasticsearch 6.x 集群迁移到 Elastic Cloud on Kubernetes (ECK)

张开发

• 2026/6/12 6:15:42 • 15 分钟阅读

分享文章

将你的 OpenShift Elasticsearch 6.x 集群迁移到 Elastic Cloud on Kubernetes (ECK)

作者来自 Elastic Omer Kushmaro 及 Jamie Parker从传统 OpenShift Elasticsearch Operator (ES 6.x) 迁移到现代 Elastic Cloud on Kubernetes (ECK) 的逐步指南。刚接触 Elasticsearch加入我们的 Elasticsearch 入门网络会议。你也可以立即开始免费的 cloud 试用或在你的本地机器上尝试 Elastic。Red Hat 的 OpenShift 平台长期以来一直是企业 Kubernetes 工作负载的可信基础其内置的 Elasticsearch Operator 多年来简化了日志管理。但美好的事物总在演进OpenShift Elasticsearch Operator 在 Red Hat OpenShift Container Platform (OCP) 4.13 中达到了支持生命周期的终点它管理的 Elasticsearch 6.x 集群早已停止支持。我们与 Red Hat 密切合作整理了这份逐步指南帮助你从旧有设置迁移到Elastic Cloud on Kubernetes(ECK)这是由 Elastic 直接维护的现代全功能 operator。我们设计的迁移路径尊重你已经依赖的 OpenShift 原生工具尽量减少中断并为未来升级到 8.x、9.x 及更高版本打下坚实基础。重要性说明安全性与支持Elasticsearch 6.x 的最后一次补丁发布是在 2022 年 1 月 13 日。ECK 让你可以按自己的节奏升级并由 Elasticsearch 的创建者提供受支持的 operator。继续使用旧版本 Elasticsearch 会面临支持风险或已知安全问题。你错过的功能Autoscaling、data tiers、machine learning (ML) jobs、searchable snapshots。旧 operator 中没有这些功能。面向未来的运维ECK 与每个新的 Elastic 版本同步发布因此你再也不会被迫等待。高层计划阶段目标结果0Snapshot 并 sanity-check 你的 6.x 集群。你有备份以防需要。1在 Red Hat operator 旁安装 ECK 2.16.1。两个 operator 安全共存。2启动一个新的、可用于生产的由 ECK 管理的 ES 6.8.23 集群。空的 ECK 管理集群。3将数据恢复到新集群。所有索引现在都在 ECK 下运行。4将 openshift-logging 指向新服务并停用旧 operator。单一真实来源。5滚动升级 Elasticsearch 到 7.17.28。最新的长期 7.x 版本。6升级 ECK 到 3.3.1。operator 使用当前版本。7安排你自己的升级到 8.x 9.x。你掌控时间。8清理移除旧 operator随时可以收藏这份列表。每个里程碑都很小、可逆并在继续之前经过验证。0. 预检查A. 健康优先运行 /_cat/health 并确保状态为 green。B. 磁盘水位线在开始迁移前至少保持 20% 空闲空间。C. 最终快照S3、GCS、NFS 都可以只要你能在新集群中挂载相同的 repo。如果你的环境中没有可用的对象存储你可以使用 Red Hat 的这篇 solution-post 将数据快照到 OpenShift 集群上的本地存储。D. 查看文档Elastic 提供了详细的文档指导在 Elasticsearch 集群之间迁移数据。1. 安装 ECK 2.16.1你的“桥接” operatorECK 2.16.1 是最后一个仍然接受 spec.version: 6.8.x 的版本这使它成为连接过去与未来 Elasticsearch 版本的理想桥梁。helm repo add elastic https://helm.elastic.co helm repo update oc create namespace elastic-system helm install elastic-operator elastic/eck-operator --version2.16.1 -n elastic-system --create-namespace你可以保留 Red Hat operator两个 operator 监控不同的 Custom Resource Definitions (CRDs)所以不会互相干扰。请注意在 OpenShift 下ECK 的日志中可能会显示一些 Transport Layer Security (TLS) 错误因为 OpenShift 尝试通过 HTTP 连接其 healthcheck webhook 端点而 ECK 仅允许 TLS 通信。这是一个已知问题不会造成实际问题。如需在本地进行命名空间安装你可以参考 Elastic 文档。2. 在 ECK 下启动 6.x 集群下面是一个初始的 Kubernetes manifest它在弹性独立 master 节点和成本三个 hot-tier 数据节点之间取得平衡。请根据你的环境替换存储类名称、资源和快照凭证。注意下面使用的语法与在 ECK 上部署新版 Elasticsearch 时略有不同。apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es-logs namespace: elastic # Create this namespace prior, or use another namespace spec: version: 6.8.23 nodeSets: - name: hot count: 3 volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce storageClassName: gp3-csi # adjust if needed resources: requests: storage: 100Gi # Storage may vary depending on config: node.master: true node.data: true node.ingest: true node.attr.data: hot cluster.routing.allocation.awareness.attributes: data podTemplate: spec: containers: - name: elasticsearch resources: requests: memory: 16Gi cpu: 2 limits: memory: 16Gi --- apiVersion: kibana.k8s.elastic.co/v1 kind: Kibana metadata: name: kibana namespace: elastic spec: version: 6.8.23 count: 1 elasticsearchRef: name: es-logs podTemplate: spec: containers: - name: kibana resources: requests: memory: 1Gi cpu: 0.5 limits: memory: 4Gi部署它观察 pods 启动你就可以准备导入数据了。3. 迁移数据要将数据从一个 Elasticsearch 集群迁移到另一个集群你也可以进一步参考 Elastic 文档中的这份指南。在本文中我们假设使用 snapshot 和 restore 方法。Snapshot 和 restore是最快的方法# on the old cluster, take a snapshot PUT _snapshot/log-backups { type: s3, settings: { ... } } PUT _snapshot/log-backups/final-snap-2025-08-07 # on the new cluster (readonly!) PUT _snapshot/log-backups { type: s3, settings: { readonly: true, ... } } # Perform the restore operation POST _snapshot/log-backups/final-snap-2025-08-07/_restore不能共享对象存储可以使用 remote re-index较慢但适用于任何环境缺点是无法迁移 index templates、component templates 等或者通过一次性 Logstash 任务导入日志。4. 配置 ClusterLogging operator首先我们需要停用由 Red Hat operator 管理的 Elasticsearch 集群。我们将按如下方式修改 ClusterLoggingoc edit clusterlogging instance -n openshift-logging --------- logStore: elasticsearch: nodeCount: 0 # scale down node count, previously 0 redundancyPolicy: ZeroRedundancy type: elasticsearch managementState: Managed # this needs to be kept, as it will manage the fluentd instance for us. visualization: kibana: replicas: 0 # scale down kibana as well type: kibana然后我们将定义一个 ClusterLogForwarder将日志从 fluentd 定向到我们新建的、由 ECK 管理的 Elasticsearch 6.x 集群。我们需要创建一个包含 Elasticsearch 凭证的 secretoc create secret generic eck-es-credentials \ -n openshift-logging \ --from-literalusernameelastic \ --from-literalpassword$(oc get secret es-logs-es-elastic-user -n elastic -o jsonpath{.data.elastic} | base64 -d)要配置 TLS如建议的那样你需要为 ClusterLogForwarder 创建一个 ConfigMap以信任 ECK 的 ca 证书。更多指导可参考这里。我们将运行以下命令oc -n elastic get secret es-logs-es-http-certs-public \ -o go-template{{index .data tls.crt | base64decode}} ca.crt oc -n openshift-logging create configmap eck-es-ca \ --from-fileca-bundle.crtca.crt创建证书 secret然后我们将在 ClusterLogging CRD 中引用它⚠️ 如果你在排查连接问题可以临时设置 tls.insecureSkipVerify: true但不应长期使用。因为我们将旧索引恢复到一个新的 ECK 管理集群中OpenShift Logging 不会自动重新创建旧索引布局或别名。你必须确保写入别名存在并指向可写索引。在我的案例中我需要确认别名设置正确如下app-writeinfra-writeaudit-write指向具有动态映射的索引不推荐以尽量减少错误和排查步骤。# Forward ES port to local machine oc -n elastic port-forward svc/es-logs-es-http 9200:9200 PASS$(oc -n elastic get secret es-logs-es-elastic-user -o jsonpath{.data.elastic} | base64 -d) # Make sure the write alias points to the correct backing index curl -s -k -u elastic:${PASS} -XPOST https://localhost:9200/_aliases \ -H Content-Type: application/json \ -d { actions: [ { add: { index: infra-000002, alias: infra-write, is_write_index: true } } ] }对 app-write 和 audit-write 及其各自的 backing 索引重复操作。现在我们应该可以看到数据开始流向新的 ECK 管理集群。5. 滚动升级到 7.17.29并验证现在你可以最终告别 6.x 版本了。A. 使用 curl 对 Elasticsearch 检查 _xpack/migration/deprecations?pretty以处理弃用问题。此 API 会返回升级前需要关注的警告或关键事项。B. 修改 CRD将其升级到最新的 7.x 版本。我使用的是 7.17.29。oc -n elastic patch elasticsearch es-logs --typemerge -p {spec:{version:7.17.29}}C. ECK 会一次重启一个节点。你的集群在整个过程中应保持在线。D. 在继续操作前给集群任务和 shard 恢复留出时间稳定。E. 别忘了以相同方式升级 Kibana。oc -n elastic patch kibana kibana --typemerge -p {spec:{version:7.17.29}}完成后检查你的 Elasticsearch 版本和 Kibana 版本以及健康状态oc -n elastic get elasticsearch es-logs oc -n elastic get kibana kibana6. Operator 升级ECK 2.16.1 → 3.3.1ECK 升级过程非常简单无趣helm upgrade elastic-operator elastic/eck-operator -n elastic-system --version 3.3.1观察 operator pod 滚动更新。你的 Elasticsearch 集群保持运行只有 controller 会重启。通过查看 operator 日志并确保没有重大错误出现来验证升级是否成功oc logs -n elastic-system sts/elastic-operator然后验证 operator 的新版本现在应为 3.3.1helm -n elastic-system list8. 升级到 8.x 和 9.x 的路线图当你准备好时你现在处于ECK Operator3.3.1Elastic Stack7.17.29这一组合是完全受支持的并作为升级到 8.x 的官方起点。首先阅读 Elastic 升级文档非常重要。我们将再次检查从 7.17.29 到最新 8 版本8.19.9之间是否存在任何重大破坏性变更GET _migration/deprecations?pretty仔细查看此查询的结果非常重要并执行必要的步骤例如重新索引 indices、修改 mappings 等。一旦你完成了从 7.17.29 到 8.x 的所有必要更改oc -n elastic patch elasticsearch es-logs --typemerge -p {spec:{version:8.19.9}} oc -n elastic patch kibana kibana --typemerge -p {spec:{version:8.19.9}}ECK 会处理剩下的工作。只需记住同时升级 Beats、Logstash pipelines 和客户端库以避免 wire-protocol 出现意外问题。重复此过程即可迁移到最新的 9.x 版本。8. 清理移除 Red Hat Elasticsearch operator。既然你不再使用 Red Hat Elasticsearch operator可以将其从集群中移除。操作步骤如下A. 在 OpenShift 控制台中进入Operators然后进入Installed Operators。B. 在Filter By Name字段中输入 “Elasticsearch”找到已安装的 Red Hat Elasticsearch operator。C. 在Operator Details页面从 Actions 列表中选择Uninstall Operator。D. 在Uninstall Operator?对话框中选择Uninstall。这将移除 operator、operator 部署和 pods。完成此步骤后operator 停止运行并且不再接收更新。所有这些步骤可参考 Red Hat OpenShift 文档中的此链接。总结通过将ECK 2.16.1作为桥接安装、快照恢复到新集群并在升级到 ECK3.3之前干净地完成 7.x 升级你已将一个老旧、不受支持的日志后端转变为现代、安全、一流的 Elastic 部署实现了无意外和零停机。原文https://www.elastic.co/search-labs/blog/openshift-elastic-cloud-kubernetes-migration