Munging is the standard definition for irrevocably changing or damaging data beyond its original state. The term is thought to have originated as a backronym for “Mash Until No Good”.
However, when referring specifically to Data Munging (or Data Wrangling), it means preparing your data for a dedicated purpose - taking the data from its raw state and transforming and mapping into another format, normally for use beyond its original intent.
Often ‘raw’ data can be hard, even impossible, to analyse and gain useful insights from. This is where somebody will transform the data entries, fields, rows and columns into a more useful format. Activities to achieve this might include:
The final data can then be sent to the relevant data analyst or stored, ready to be analysed at a later date.
A specific example of data munging might be used in Machine Learning, in order to restructure data in a way that could be used by a learning algorithm.
A common example of damaging data is with email addresses. Typically, to prevent spam, a user will destroy the valid format of an email address by writing it in a way that humans understand but computers do not, such as:
JohnDOTdoeATJohnDoeDOTcom or John(dot)doe(at)John(dot)doe(dot)com
Read our Experian Pandora guide today to find out how it's standardising, cleansing and matching capabilities can help with the data munging project you are working on.