This is a Hive function test repository. It contains three kind of functions:
- UDF: User Defined Functions
- UDAF: User Defined Aggregate Function
- UDTF: User Defined Tabular Function
The structure of repository complies with the maven standard.
If you need Hadoop and Hive environment, please see my another repository docker hadoop environment.
rm -rf metastore_db/
schematool -initSchema -dbType derby
hive
In hive console
hive> create database studentdb;
hive> use studentdb;
hive> create table student(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
# move data/student.txt "/root/student.txt" is path. replace to your path.
hive> load data local inpath "/root/student.txt" into table student;
then run follow:
hive> select * from student;
OK
95002 刘晨 女 19 IS
95017 王风娟 女 18 IS
95018 王一 女 19 IS
......
install maven
UDFs works on a single row in a table and produces a single row as output. Its one to one relationship between input and output of a function.
Get a int number, return a*2
src/main/java/com/example/MyUDF.java
# concate two string.
# src/main/java/com/example/ConcateUDF.java
Here we use org.apache.hadoop.hive.ql.udf.generic.GenericUDF. which supports complex types:array, map, struct, and uniontype.
User-Defined Aggregation Functions (UDAFs) are an excellent way to integrate advanced data-processing into Hive. Hive allows two varieties of UDAFs: simple and generic. Simple UDAFs, as the name implies, are rather simple to write, but incur performance penalties because of the use of Java Reflection, and do not allow features such as variable-length argument lists. Generic UDAFs allow all these features, but are perhaps not quite as intuitive to write as Simple UDAFs.
src/main/java/com/example/MyUDAFAve.java
src/main/java/com/example/MyUDAFCount.java
Here we supply MyUDAFAve and MyUDAFCount function to test. Writing GenericUDAFs: A Tutorial
User defined table functions represented by org.apache.hadoop.hive.ql.udf.generic.GenericUDTF interface. This function allows to output multiple rows and multiple columns for a single input.
Given a chinese name, return First name, Last name.
src/main/java/com/example/MyUDTF.java
- we specify input and output parameters
abstract StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException;
- we process an input record and write out any resulting records
abstract void process(Object[] record) throws HiveException;
- function is Called to notify the UDTF that there are no more rows to process. Clean up code or additional output can be produced here.
abstract void close() throws HiveException;
more info click Write UDTF
cd repository path. execute the following command:
mvn package
now, you will find target folder in work directory. then copy target/hadooptrain-1.0-SNAPSHOT.jar to hive.
In hive shell
add jar target/hadooptrain-1.0-SNAPSHOT.jar;
# create temporary function {function_name} as 'com.example.{Java_class_name}';
create temporary function myudf as 'com.example.MyUDF';
create temporary function myudf_concate as 'com.example.MyUDFConcat';
create temporary function myudaf_ave as 'com.example.MyUDAFAve';
create temporary function myudaf_count as 'com.example.MyUDAFCount';
create temporary function myudtf as 'com.example.MyUDTF';
use studentdb;
age * 2
select age,myudf(age) from student;
alice+Bob
select myudf_concate("alice","Bob");
myudf_ave
select department,myudaf_ave(age) from student group by department;
myudf_count
select department,myudaf_count(name) from student group by department;
"刘晨" "刘" "晨"
select t.name,t.surname from student lateral view myudtf(name) t as name,surname;
exit;