The art of collecting, cleaning and storing small data. Part 1.
In our last blog, we presented a data definition framework. Among all data categories from internal and external sources, human generated data may be crucial for a company. Also, even in the era of Big Data spreadsheets are still very commonly used in small companies and corporations. In other words, there are millions of people collecting and working with the small data type of data which can be easily fitted in an Excel spreadsheet. For instance, from this video , you can learn few interesting facts about the scope of using spreadsheets. Speaking of this video, it is one of the educational videos from the great MOOC by Delft University of Technology called "Data Analysis: Take it to the MAX()". The instructor emphasised that typically one spreadsheet may live for five years and be used by twelve workers. A lot of Flexmonster Pivot Table Component users work with Excel spreadsheets and small data too. Thus, in the next blogs, I will concentrate on useful tips on collecting, storing and cleaning small data.
Use a unique ID column
Probably it sounds obvious and simple, but it's still very effective and may help you avoid a lot of troubles in the future. Anytime you need to restore the original order you can do it with ease.
Use the first row for variable name and one column for one variable
It may be important as it is a standard for a lot of stats programs. By doing this you also make your further work less exposed to errors. You can create any complex variable on the fly if your spreadsheet is organised in a simple but smart way.
Organise your empty cells
For any sort analysis, it is crucial to have data in every cell. However, it is a common situation when you have to wait for more information. Also, when you have a lot of colleagues entering new data it is likely to fall into a trap of messy and incomplete data. Imagine all possible mistakes throughout doing analysis if you have zeros in the cells where no data supposed to be. For a lot of people, zero means nothing, no data, but zero is a real number which can immediately spoil your analysis.
There are a few techniques to handle with empty cells. Important thing is that you should pick some code system for incomplete cells so you will always know what this particular empty cell means. Moreover, you should keep your code system in a different spreadsheet. Thus, in a month or two you will be able to understand all these illegal or weird numbers you used previously. While working on a spreadsheet it's better to rely on your notes, not on your memory. Also, it can be very helpful for your colleagues who might work with your spreadsheet the other day.
Use excel functions
Probably, you've met spreadsheets with a lot of manual entries on the places where functions could be used. Beforehand we've mentioned practical online course which can help to fill some gaps.
Lastly, when working with a spreadsheet try to keep all your data on a single spreadsheet. Use different tabs for raw data, for codes etc, but keep in all simple.
To sum up, all listed tips can help to prevent errors at the next stages before analysis such as storing and cleaning your spreadsheet.