I have written a utility that pulls data from each table in a database, loops through the data and cleanses it of NPI/PII information and then saves the cleansed data to .sql files in the form of insert statements. The utility is written in C# and uses a SqlDataReader to pull the data into the application. Currently the biggest problem that I am having is one of the tables has an XML column (the column itself is defined as nvarchar(max) not XML) and the XML packets have NPI/PII data in them. In order to wipe the NPI/PII data I have a .txt file that contains all of the possible XPaths that contain this type of data, I read the .txt file into memory and then for each row I loop through the XPaths and cleanse whatever data is found. Needless to say, this takes time, a lot of time because the table with the XML is rather large. Is there a better way to do this to up the performance of the utility?
The code that cleanses the packet is:
private static XmlNode CleanTxn(XmlDocument node, string[] xPaths) { XmlNamespaceManager nsmgr = new XmlNamespaceManager(node.NameTable); nsmgr.AddNamespace("a", "http://schemas.bankerssystems.com/2004/ExpereTxn"); XmlElement root = node.DocumentElement; foreach (XmlNode xn in from xPath in xPaths select xPath.Replace("/", "/a:") into xp select root?.SelectNodes(xp, nsmgr) into nodeList where nodeList?.Count > 0 from XmlNode xn in nodeList select xn) { xn.InnerText = Utilities.CleanString(xn.InnerText); } return node; }