During one of my assignments I have come across a situation where we need to fix data issues in SharePoint Lists.
One of the issues that we found was presence of duplicate data. In order to fix that problem in hand I had developed a Powershell Script to find out duplicate data based on a specific or a group of columns.
For the sake of demo, I have added a SharePoint List with some duplicate records in it as shown below:
Now let’s look into the code to understand implementation details-
In Step 1 we are getting references of the Site and Web where the SharePoint List resides
In Step 2 we are splitting the list of columns based on which we want to find out the duplicate data
We can see there are two input variables “ColumnToValidate” and “ColumnToDisplay”. “ColumnToValidate” provides columns based on which duplicity needs to be checked while “ColumnToDisplay” contains the list of columns that needs to be the part of data export.
In Step 3 we are creating the export folder that will hold the CSV files exported with duplicate records
In Step 4 we are creating the list object that will give the handle on the list which needs to be validated
In Step 5 we are getting list of Items from SharePoint List and grouping them based on the validation columns
In Step 6 we are creating the directory for export files
In Step 7 we are exporting all the groups which is having item count greater than 1 (this logic identifies the duplicate items)
That is all for the code.
Now we will see the variation in outputs depending on the columns specified for duplicacy check
In Step 8 we specify the validation and display columns, for the first execution we will check duplicate values in “Title” column
In Step 9 we are calling the “Get-DuplicateListItems” function to find the duplicate values
After the function executed successfully we can see the following output.
In Step 10 we can see the output of this excution and can see 6 items found which duplicate in Title Column
In Step 11 we can see the CSV file that is exported by the execution considering “Title” Column to be validated.
In Step 12 we can see the output file and can notice duplicate values in “Title” Column
In Step 13 we have changed the list of columns to be validated. In this second execution I have added another column “Role”.
Now the list will be validated for duplicity based on the combination of “Title & Role” Columns
In Step 14 we can see the output of this excution and can see 4 items found which duplicate in “Title & Role” Columns
In Step 15 we can see the CSV file that is exported by the execution considering “Title & Role” Column to be validated
In Step 16 we can see the output file and can notice duplicate values in “Title & Role” Column
In Step 17 we have changed the list of columns to be validated. In this second execution I have added another column “Location”.
Now the list will be validated for duplicity based on the combination of “Title & Role & Location” Columns
In Step 18 we can see the output of this excution and can see 2 items found which duplicate in “Title & Role & Location” Columns
In Step 19 we can see the CSV file that is exported by the execution considering “Title & Role & Location” Column to be validated
In Step 20 we can see the output file and can notice duplicate values in “Title & Role & Location” Column
This is a very simple technique that can be used to fix one of the issues with SharePoint List data.
Hope you find it helpful.
Why go through all those contortions? Just export the list to Excel and manipulate the data there, where it’s designed to be manipulated.
_____________________________________________________________________
Scot Witt
Business Analyst-Lead Consultant II
Workforce Technology Solutions- User Productivity Solutions Engineering
Knowledge Management
Scot, this process has been designed to remove all manual steps.
This solution was the part of a big picture with lot of moving parts running based on inputs of one process serving output for another.
The framework that I wrote for that business problem was responsible to find and fix various types of data issues with multiple SharePoint Lists without any manual intervention at all.
I believe that’s we consider as Process Automation. 🙂