Hive Component Usage

Connection Methods

Connecting to Hive Server2 via Beeline

Environment Preparation

Prepare a pod for the Hive client. Create a new YAML file locally, for example, hive-client.yaml, and fill in the following content.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hive-client
  namespace: default
  labels:
    app: hive-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hive-client
  template:
    metadata:
      labels:
        app: hive-client
    spec:
      volumes:
        - name: hive-config
          configMap:
            name: hive-server2-context
            defaultMode: 420
        - name: hdfs-config
          configMap:
            name: hdfs-config
            defaultMode: 420
        - name: kerberos-config
          configMap:
            name: krb5-config
            defaultMode: 420
        - name: user-keytab
          persistentVolumeClaim:
            claimName: home-keytab-data-pvc
      containers:
        - name: hive-client
          image: od-registry.linktimecloud.com/ltc-hms:3.1.3-1.17
          command:
            - tail
            - '-f'
            - /dev/null
          env: []
          resources:
            limits:
              cpu: '2'
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 512Mi
          volumeMounts:
            - name: hive-config
              readOnly: true
              mountPath: /opt/hive/conf/hive-site.xml
              subPath: hive-site.xml
            - name: hdfs-config
              readOnly: true
              mountPath: /opt/hive/conf/core-site.xml
              subPath: core-site.xml
            - name: hdfs-config
              readOnly: true
              mountPath: /opt/hive/conf/hdfs-site.xml
              subPath: hdfs-site.xml
            - name: kerberos-config
              readOnly: true
              mountPath: /etc/krb5.conf
              subPath: krb5.conf
            - name: user-keytab
              readOnly: true
              mountPath: /keytab
          imagePullPolicy: IfNotPresent
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
      imagePullSecrets:
        - name: devregistry
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - hive-client
              topologyKey: kubernetes.io/hostname

Notes:

  • Set the namespace to where the hdfs/hive-metastore/hive-server2 cluster is located.
  • image use hive-metastore image
  • configmap hive-server2-context contains the Hive Server2 configuration including hive-site.xml.
  • configmap hdfs-config contains the HDFS configuration including core-site.xml and hdfs-site.xml.
  • configmap krb5-config contains the KDC configuration including krb5.conf.
  • pvc home-keytab-data-pvc contains the user's keytab.

If Kerberos is not enabled in the cluster, you can remove the volumes/volumeMounts for KDC/keytab.

Execute kubectl apply -f hive-client.yaml to create a Hive client pod.

For example, in the default namespace, execute the kubectl command to enter the Hive client pod.

kubectl exec -it hive-client -n default -- bash

Connecting in High Availability Mode

Multiple instances of hive-server2 will register their service addresses to ZooKeeper, and clients can obtain a random address from ZooKeeper.

If Kerberos is enabled, first execute kinit, then connect with Beeline.

kinit -kt /keytab/user1/user1.keytab user1
beeline -u 'jdbc:hive2://zookeeper:2181/;serviceDiscoveryMode=zookeeper;zooKeeperNamespace=default_hiveserver2/server;principal=hive/_HOST@BDOS.CLOUD'

If Kerberos is not enabled, you can connect directly as the root user.

beeline -u 'jdbc:hive2://zookeeper:2181/;serviceDiscoveryMode=zookeeper;zooKeeperNamespace=default_hiveserver2/server' -n root

Note:The value to fill in for zooKeeperNamespace in the above Beeline connection string is _hiveserver2/server. If the namespace is admin, then it should be replaced with zooKeeperNamespace = admin_hiveserver2/server.

Connecting to a Specific Hive Server2 Instance

You can also connect to a specific instance of hive-server2.

With Kerberos enabled:
beeline -u 'jdbc:hive2://hive-server2-0.hive-server2:10000/;principal=hive/_HOST@BDOS.CLOUD'
# Without Kerberos enabled:
beeline -u 'jdbc:hive2://hive-server2-0.hive-server2:10000/' -n root

Connecting to Hive Server2 via Java

Configure project dependencies (hadoop-common and hive-jdbc) in the pom.xml file. The additional project dependencies for this example are as follows.

<dependencies>
  <dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>3.1.3</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>3.1.1</version>
  </dependency>
</dependencies>

Write code to connect to HiveServer2 and manipulate Hive table data. The example code is as follows.

import java.sql.*;

public class App
{
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) throws SQLException {

        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }

        Connection con = DriverManager.getConnection(
            "jdbc:hive2://hive-server2-0.hive-server2:10000", "root", "");

        Statement stmt = con.createStatement();

        String sql = "select * from sample_tbl limit 10";
        ResultSet res = stmt.executeQuery(sql);

        while (res.next()) {
            System.out.println(res.getString(1) + "\t" + res.getString(2));
        }

    }
}