Error Log:java.lang.OutOfMemoryError: GC overhead limit exceeded or java.lang.OutOfMemoryError: Java heap space
.
Solution: Increase the memory allocated to the Container.
set spark.executor.memory=4g;
Error Log:
22/11/28 08:24:43 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:611)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:813)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:719)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:787)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
Root Cause:The HashTable used by GroupBy is consuming too much memory, leading to an Out of Memory (OOM) error.
Solutions:
mapreduce.input.fileinputformat.split.maxsize=134217728
or mapreduce.input.fileinputformat.split.maxsize=67108864
in the configuration.spark.executor.instances
.spark.executor.memory
parameter to a higher value.Error Log:
FAILED: Execution ERROR, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timeout
Root Cause:The possible reason for the job exception is that there are too many table partitions, which leads to a prolonged drop operation and ultimately results in a network timeout for the Hive Metastore client.
Solutions:
hive.metastore.client.socket.timeout=1200s
alter table [TableName] DROP IF EXISTS PARTITION (ds<='20220720')
Root Cause:The select count(1) query utilizes Hive table statistics (statistics), but the statistics for this table are inaccurate.
Solutions:Modify the configuration to disable the use of statistics.
hive.compute.query.using.stats=false
Or use the analyze command to recalculate the table statistics.
analyze table <table_name> compute statistics;
Symptoms:
Solutions:
set hive.optimize.skewjoin=true;
Increase the number of concurrent tasks by raising the value of spark.executor.instances
.
Enhance the memory allocation for Spark executors by adjusting the spark.executor.memory
parameter to a higher value.
Problem Description: After creating an external table, the query returns no data.
An example of an external table creation statement is as follows.
CREATE EXTERNAL TABLE storage_log(content STRING) PARTITIONED BY (ds STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION 'hdfs:///your-logs/airtake/pro/storage';
Query returns no data.
select * from storage_log;
Root Cause: Hive does not automatically associate the specified Partitions directory.
Solutions:
Manually specify the Partitions directory.
alter table storage_log add partition(ds=123);
Query returns data.
select * from storage_log;
Data returned as follows.
OK
abcd 123
efgh 123