Weather

Cleaning a Data file having braces,brackets or any symbols and replacing them with comma (Save as comma delimited file)

Replacing symbols (,),{,},\,@with any specified delimiters in pig (cleaning a data file to desired format)


sample records: to clean
(7,98243),7,34842,09K Video,1,Allen
(11,77623),11,34843,J Case 1500,2,John
(19,88734),19,34857,DVD J INT,7,Hubert
(24,45641),24,34856,500 GB HD T,5,Roger
(47,92387),47,34854,J Power 300W,1,Cliff
(64,77624),64,34833,J Case 1501,17,Mello
(64,92387),64,34847,J Power 300W,4,Mello

cli>pig -x local
--laoding the entire record as LINE of datatype chararray
--loading maxtemp* file in path /home/hadoop/datasets

grunt>fstep1 =  load '/home/hadoop/datasets/maxtemp* using PigStorage('\n') as (line:chararray);

grunt>r1 = foreach fstep1 generate  REPLACE(line, '\\/' , ' ,') as (line:chararray);
grunt>r2 = foreach r1 generate  REPLACE(line,  '\\(' ,  ' ,') as (line:chararray);
grunt>r3 = foreach r2 generate  REPLACE(line,  '\\)' ,  ' ,') as (line:chararray);
grunt>r4 = foreach r3 generate  REPLACE(line,  '\\{' ,  ' ,') as (line:chararray);
grunt>r5 = foreach r4 generate  REPLACE(line,  '\\}' ,  ' ,') as (line:chararray);

grunt>store  r5 into '/user/hadoop/pigresults' using PigStorage(',');








No comments:

Post a Comment