Problem with Cleansing Phase Not Completing

Category: sql server dataquality


LCPHOTO8724 on Wed, 23 Dec 2015 17:57:27

I recently installed the Data Quality Services option for SQL Server 2014 Developer on my Windows 7 machine.  The install completed successfully and the configuration of the server went fine.  In DQS, I created a new knowledge base for cleaning a small sample Excel (.xlsx) spreadsheet containing 7 records.  I had no problems creating this knowledge base.  However, while building a new data quality project, I encountered a problem during the cleansing phase.  This phase just keeps on running without ever completing (showing 0% records).  I stopped it after a half-hour, then had to reboot my computer because the stopping process would not complete.  I tried using a .xls version of the Excel file, but that did not work.  I also tried running DQS as administrator, but it did not resolve the issue.

Any advice would be appreciated.  I have spent much time searching the Internet and have not found a solution.

Thank you.

Additional Info...

Referring to the Data Quality Section, the process looks like it should be completed since all fields that were checked have green bars indicating completeness in the Profiler section.  Also, the Stop button never changes to "Restart" (as would be expected) and % Cleansing Records remains stuck at 0%.


AV111 on Fri, 25 Dec 2015 12:38:46

What is your data?

are you using  NVARCHAR(MAX) and VARCHAR(MAX)  ?

Limitation with cleansing data columns of NVARCHAR(MAX) and VARCHAR(MAX) data types in the DQS Cleansing component in Integration Services

Data columns of the NVARCHAR(MAX) and VARCHAR(MAX) data types are not supported in the DQS Cleansing component in Integration Services. As such, these data columns are unavailable for mapping in the Mapping tab of DQS Cleansing Transformation Editor, and hence cannot be cleansed.

To fix this issue, before processing these data columns using the DQS Cleansing component, you must convert them to DT_STR or DT_WSTR data type using the Data Conversion transform.


LCPHOTO8724 on Thu, 31 Dec 2015 00:02:48

Thank you for your reply and advice.

I believe I found the problem.  It appears to be a performance issue with my Windows 7 computer and DQS.  One of the domains, US - Last Name, contains 151,672 values.  If I exclude it from the mapping used for the cleansing, the cleansing process will complete within a minute using the other domains each containing considerably fewer values: department has 22 values, department group has 8 values, gender has 7 values and first name has 8 values.  So, the US - Last Name domain is taking awhile to process. 

If there are any suggestions on how to improve DQS performance, it would be appreciated.

ShawshankSteve on Tue, 27 Sep 2016 08:47:31

Hi Dinesh,

Did you ever figure this out?  I have exactly the same problem.  I suspect that I'm also learning from the same book as you as I too have a spreadsheet with 7 rows.

I'm not sure if it would ever complete by the looks of it.  I find it odd that it behaves this way and I even upped the SQL Server memory but it made no difference.  It makes it look as though it just doesn't work or is so slow that it is unusable.  And this from a spreadsheet of 7 rows.  Can you imagine a much larger spreadsheet or a huge database?

Puzzled by this but just ignoring it and moving on but thanks for the tip that it was the Last Name.