It has been a busy summer
It has been a busy summer at the Population Project
Antoine Bello · September 19, 2021
It has been a busy summer at the Population Project. After the initial sprint, we’re settling into a marathon pace more suited to our time horizon, which is measured in years, if not decades. We have more clarity on the challenges ahead:
Gathering lists is relatively easy in some countries (the US, France), much more difficult in others (Germany, Peru). Some countries have the good sense of separating first, middle, and last names, while others only display a full name that is sometimes difficult to break down. Hiring and coordinating volunteers has proved trying, leading us to target fewer countries where we hire highly productive freelance web researchers. It’s still a bit early to discuss metrics as numbers vary greatly from one week to the next. We can find the names of 20 million Mexican public employees on Monday and collect only 500,000 names the rest of the week. Overall though, we think we’ve only tapped the tip of the proverbial iceberg.
Processing lists is less rewarding than finding them but just as important. Most lists come in PDF format. They need to be converted and purged from any unnecessary or redondant information - a process that take anywhere between two minutes and one hour per list. As we can’t keep up with all the lists we receive, we’ve decided to outsource part of our workflow to a BPO (business process outsourcing) firm based in the Philippines and Madagascar. The first tests will begin next week. Being able to count on an apt and experienced partner should definitely free up some of our resources.
IT remains the biggest obstacle we face. The size of the database we plan on building doesn’t allow for shortcuts. Slight conception mistakes might prove absolutely catastrophic when we have one or five billion records. After an initial development in Postgres, we’ve scrapped our entire code and started anew on 4D, a powerful yet agile relational database system. The problems (comparing humans, merging records, etc.) haven’t gone away but they’re easier to frame. We won’t publish a beta version of the site as originally planned but hope to have a prototype some time during the first half of 2022.
As always, we welcome your feedback. If you feel like helping in any capacity - be it to collect lists, process them, or design our website - please contact me. One last ask: please share this post or your LinkedIn page or Facebook page - we haven’t had time to hire a PR firm yet!