In many small file scenarios, Spark will start many tasks. When there is a Shuffle operation in the SQL logic, the number of hash buckets will be greatly increased, which will seriously affect the performance. In Fusioninsight, scenarios for small files usually use the( )Operator to merge partitioni generated by small files in Tabler, reduce the number of partitions, avoid generating too many hash buckets during shuffle, and improve performance?
- group by
- coalosce
- onnect
- join
Reveal Solution
Next Question