Awesome analysis. Inspired insight.®

BirchGrove Resources You Can Use: Thinking

Are structured data reasonably structured?

There's a wealth of available government procurement data out there. Millions of contract actions are reported yearly to the Federal Procurement Data System. Hundreds of thousands of solicitations and other notices reach us annually through the FedBizOpps platform. This wealth of procurement information represents another kind of "big data" challenge.

To its credit, the federal government has been a strong advocate of data transparency, consolidation of procurement data systems, and public access to the data. To enable that access, agencies are supposed to fit their data in an architectural framework and use standard meanings and formats defined in data dictionaries.

But the multitude of contract writing systems, office automation suites, software connectors, and content management and data entry systems tends to make the data translucent rather than transparent.

Federal agencies are required by a White House mandate to regularly validate their data and report their findings. Despite the good intentions for improving quality, there are inconsistent uses of certain fields among and within agencies. These inconsistencies show up as improper values entered into the fields, untidy cut-and-paste filling of fields, misspellings, and different uses of procurement terms. Search for "software" and you might miss "sw", "softw", and "softwr".

For analysts attempting a thorough study of the data, this lack of structure requires a herculean effort to conform the reported data to a set of standards. That's why you need someone with the context to be able to conduct the analysis tailored to your needs and keep the bias under control.

Yes, there are large sets of rich data out there that can yield the intelligence needed to make good business decisions. The challenge is in how to peer through the translucency and make sense of government data.