Compare two flat files with multiple columns

Category: sql server ssis

Question

ZBGmail on Mon, 02 Apr 2018 14:41:05


I want to compare two flat files (Day-1, Day-2) , pipe delimited with approximately 80 columns. I will have one column called accountId which will be unique.

For e.g.

Day-1 file has

AccountId|Fname|Lname|Address|City|State|Zip|Phone

1000|John|Doe|22 East Ave|Austin|TX|99088|223-445-6789

1001|Jane|Doe|22 East Ave|Austin|TX|99088|223-445-6789

1002|Mary|Doe|22 East Ave|Austin|TX|99088|223-445-6789

Day-2 file has

AccountId|Fname|Lname|Address|City|State|Zip|Phone

1000|John|Doe|24 East Ave|San Antonio|TX|68088|225-445-6789

1001|Jane|Doe|22 East Ave|Austin|TX|99088|225-445-6789

1003|Martin|Smith|44 Tulsa Ave|Dallas|TX|77898|112-445-6791


Here is what I would like output of the comparison 

1)File A with all new entries in Day-2 and not in Day-1. (will have a row for accountid 1003)

2)File B with all entries that are missing in Day-2 but were there on Day-1 (will have a row for accountid 1002)

3)File C -this is the tricky one-. This file will have one row per every change in record per accountid that it finds in Day-2 compared to Day-1. 

AccountId|Indicator|NewValue

1000|Addr|24 East Ave

1000|C|San Antonio

1000|Z|68088

1000|P||225-445-6789

1001|P||225-445-6789

*File c shows that there were 4 changes in accountid 1000 and 1 change in accountid 1001. Each change is reflected on a new row with some indicator(like P for a phone number change)

Thanks in advance. 


Replies

ArthurZ on Mon, 02 Apr 2018 14:47:30


Hi ZBGmail,

I would proceed as follows:

  1. dump the both files into a staging table (each)
  2. Run whatever SQL necessary to grab the diffs 1 to 3

Ashish Kumar Tiwari on Mon, 02 Apr 2018 18:48:25


Hi ZBGmail,

As suggested by Aurthur, do this in SQL. It would be lot easier and flexible to do that way. However, if you must do this in SSIS, then you can do it as mentioned below (Sample DFT logic shown in attached picture with all component configurations):

1- Create two flat file source, one for each source file i.e. Day-1, Day-2.

2- Go to advance editor, set IsSorted property of both sources to true and SortKeyPosition of AccountID to 1.

3- Drag and drop "Merge Join" component, select JOIN TYPE = "Full Outer Join" from the drop down on top.

4- Check "Join Key" check box against AccountId and check the front check box for all the columns in both sources.

5- Use a "Conditional Split" component to separate outputs for FileA, FileB and FileC. Expression given in the attached picture.

6- For each column that you want to check for difference for FileC, use a derived column as shown in picture.

7- Finally, merge the outputs and filter out unwanted rows.

8- Direct your output to file destination.


HTH,
Cheers!!
Ashish
Please mark it as Answered if it answered your question or mark it as Helpful if it helped you solve your problem.


Visakh16 on Mon, 02 Apr 2018 20:05:34


Once you get all file datas on  staging tables, then all you would require would MERGE statement logic to compare between them based on account id and get differences. Then you can insert them to any table you want and take appropriate actions like notification etc

ZBGmail on Tue, 03 Apr 2018 13:39:48


Thank you so much Ashish.. Really appreciate you taking so much time to explain the process. 

ZBGmail on Tue, 03 Apr 2018 13:40:53


Thank you Arthur for your response. I will give it a try.

ZBGmail on Tue, 03 Apr 2018 13:41:29


Thank you Visakh.

Ashish Kumar Tiwari on Tue, 03 Apr 2018 18:43:49


You're welcome. Please mark it as answer if it solved your problem.