Replacing symbols (,),{,},\,@with any specified delimiters in pig (cleaning a data file to desired format)
sample records: to clean
(7,98243),7,34842,09K Video,1,Allen
(11,77623),11,34843,J Case 1500,2,John
(19,88734),19,34857,DVD J INT,7,Hubert
(24,45641),24,34856,500 GB HD T,5,Roger
(47,92387),47,34854,J Power 300W,1,Cliff
(64,77624),64,34833,J Case 1501,17,Mello
(64,92387),64,34847,J Power 300W,4,Mello
cli>pig -x local
--laoding the entire record as LINE of datatype chararray
--loading maxtemp* file in path /home/hadoop/datasets
grunt>fstep1 = load '/home/hadoop/datasets/maxtemp* using PigStorage('\n') as (line:chararray);
grunt>r1 = foreach fstep1 generate REPLACE(line, '\\/' , ' ,') as (line:chararray);
grunt>r2 = foreach r1 generate REPLACE(line, '\\(' , ' ,') as (line:chararray);
grunt>r3 = foreach r2 generate REPLACE(line, '\\)' , ' ,') as (line:chararray);
grunt>r4 = foreach r3 generate REPLACE(line, '\\{' , ' ,') as (line:chararray);
grunt>r5 = foreach r4 generate REPLACE(line, '\\}' , ' ,') as (line:chararray);
grunt>store r5 into '/user/hadoop/pigresults' using PigStorage(',');
No comments:
Post a Comment