Volunteer Handbook
The Population Project
AB
Antoine Bello · May 24, 2021
What is The Population Project?
The Population Project is a non-profit organization whose goal is to list the full name and date of birth of every human alive. Amazingly, in a time when information has never been so plentiful, we don’t know the name of the 7+ billion people with whom we share this planet. Developed countries release high-level demographic data that doesn’t provide a granular image of their population, while some developing countries are simply too poor to run censuses.
Why the Population Project? Because all lives deserve to be acknowledged. It is estimated that more than one billion people - 15% of the world population! - don't have any form of ID. Most of them don’t know exactly where and when they were born. They leave no trace and their name cannot be found on electoral lists or social security registers. By keeping a log of who lived when, the Population Project hopes to correct this injustice : queen or beggar, everybody's passage on Earth will be documented ad minimum. The Population Project's register will serve as a resource for historians and genealogists alike.
The Population Project has no political agenda, yet the very notion of a public roll of humans makes it more difficult to exterminate minorities or pretend they never existed in the first place. We might not know the name of all the past victims of oppression, but we’ll remember those of the men and women currently being persecuted. Needless to say, the Population Project gives equal weight to all languages and cultures: a human is a human, whether born in New York, Islamabad or Flekkefjord.
By its very nature, the Population Project is an ongoing and never-ending endeavor, if only because 300,000 babies are born every day. Our data will never be perfect. At any given time, it will be full of mistakes, blanks and duplicates. In other words, it will live and breathe, like the organism it reflects.
What The Population Project is not
The Population Project is not and never will be a commercial enterprise. The only data we collect is displayed on the site and is limited to a small number of fields: full name, sex, date and place of birth. If we stumble upon additional information, we don’t log it. Race, religion, sexual orientation, education, and income level are irrelevant to us.
By the same token, we don’t record the online activity of our visitors. We have adopted Wikipedia’s privacy policy. We only log your IP address if you add or edit a human’s record, so we are able to reverse the vicious acts of a troll. If having your IP address registered makes you uncomfortable, you can create an anonymous account (i.e. without providing your email address) and we’ll stop logging your IP address. Your privacy is our priority.
Why we need volunteers
Calling the Population Project an ambitious initiative might be the understatement of the year! We need to identify 7 or 8 billion people in countries with highly different linguistic structures and record-keeping traditions.
**1. Source lists ** Most volunteers at the Population Project help by pointing us to online lists of names from their country of origin. It is detective work that requires creativity, as the best sources are not always the most obvious ones. Lots of examples are provided at the end of this guide. Once you’ve found a list, you can send it to us on our website (see screenshot below). We will vet it, download it, clean it to our rigorous standards and import it to our database.
You're more than welcome to scrape lists and convert them to Excel yourselves. For tips and guidelines, please refer to The Scraping Handbook.
**2. Edit The Population Project's website ** When we publish our database, you will also be able to create and edit record, pretty much like on Wikipedia. A simple tool will allow you to notify your friends and family that you just created their record, thus contributing to the site's virality.
**3. Special projects ** Last but not least, if you have special skills in specific fields such as (but not limited to) data science, search engine optimization, linguistics, online marketing or machine learning, we’d love to hear from you too, as we face many challenges that require very particular expertise.
No commitment required
Volunteers are fully remote. You can work from anywhere as long as you have access to a computer and an internet connection.
There are no time requirements either. You are free to do as few or as many hours as you like. We are grateful for any help you can provide. A certificate for your service will be issued to you upon request.
Unfortunately, we are not able to pay you for your work, as we are a non-profit organization operating on a tight budget.
Which lists are OK and which are not?
Every once in a while, Facebook’s or some credit card company’s servers are raided by hackers who steal the personal data of hundreds of millions of people. These files are later put up for sale on the dark web. Needless to say, this is not the kind of data we’re after.
We go by a simple rule: whatever is accessible online without logging in is fair game. A few examples:
- the list of delegates on an international conference’s website;
- the names and birth years of the best Australian swimmers in the Under-12 category;
- the full identity of people who just acquired French citizenship;
- the list of inmates in the State of Nebraska;
- the list of certified attorneys in Ghana;
- the recipients of military medals in Brazil, etc.
Remember, the names you harvest will be decontextualized as they enter the Population Project’s database. Mustafa Kipembe born on 4/23/1991 might be on Nigeria's most wanted list or he might be a chess grand master. We don’t care: to us, he’s just a human.
If you need to register to see or download a list, then consider it off-limits. As a rule, we do not use social networks. If you were tempted to harvest data on Facebook, consider the following:
- You don’t have to prove your identity to create a profile page.
- Social networks are littered with fake profiles. Facebook alone deletes over 1 billion (with a B) pages every quarter.
- The information on social networks doesn’t meet the Population Project’s rigorous standards. Robert becomes “Bob”, no one lists their middle names, etc. Among social networks, those promising to reunite you with your old buddies from school (classmates.com, copainsdavant.fr...) are of the poorest quality.
To efficiently retrieve large quantities of names, we use a variety of web scraping tools. Scraping is entirely legal, as long as it is used to speed up the data collection process. We could copy the name of every marathon runner one by one, but we’d much rather capture them in a flash!
You can refer to our Scraping Handbook.
What makes a good list?
The best lists are:
- long (some can run in the millions of names);
- official (names are more likely to be correctly spelled, birth dates can be trusted, etc.);
- in an easily retrievable format, such as a clean table or a neat PDF file.
Typical factors to take into consideration are:
- Are names separated? For processing purposes, "Michael / Garcia / Velazquez" is better than semi-bundled "Michael Garcia / Velazquez" and much better than bundled "Michael Garcia Velazquez."
- Are women’s last name their married or maiden name? This is why we like lists of children or young students: last names and maiden names are the same.
- If birth dates are not indicated, can we guess the year of birth by the context of the list? Depending on the country, high school graduates are usually between 16 and 20. Swimmers in the under-14 category are 12, 13, or 14. And so on.
- Does the list include the sex? If not, our algorithms will make an educated guess based on the first name but it will always be less accurate than the explicit data.
What makes a bad list?
Some lists cannot be used because:
- they're scanned lists, not original PDFs.
- the tables are too compact, with text overflowing from one column to the next.
- they're too informal, using nicknames or containing obvious typos.
- they only include bundled names ("John Smith", "Mario Lopez", without any indication of date/year of birth.
What information do we collect?
- first name;
- middle name (or initial);
- last name;
- maiden name;
- sex at birth (“M” or “F”);
- date of birth or, if not available, year of birth;
- place of birth;
- country (using ISO-3 terminology : “ARG” for “Argentina”, “DEU” for Germany, etc.)
To be sure, few lists include all fields. This is why we use various sources. But this shouldn’t stop you from feeling proud if you find a list with all the fields above!
We only collect the profiles of live people. If dealing with a list of "famous Peruvians" or "best cricket players ever", please remove those with a date of death.
How to look for sources
First of all, what works in a country may not in another. The US and India have voters' records, while most other countries don't. France publishes the results of its end-of-high-school examination, Brazil doesn't. And just because the federation of Argentina doesn't publish the list of licensed tennis players doesn't mean the badminton federation won't.
Below is a list of sources that have been known to work in at least some countries. They're roughly ordered by level of importance. Getting access to all birth records for instance would pretty much negate the need to check other categories. But apart from those, no source covers the entire national population. Voter records are great but they overlook minors. On the contrary, exam results will only include teenagers and young adults. This is why, ultimately, we'll have to explore all sources.
One last remark, knowing that one type of source didn't yield any results in your country is in itself valuable information. We're working on a database that will let new volunteers see what has already been done in their country. That way, they won't bother looking for voters lists and will focus their efforts on, say, sports federations.
Birth records
Most countries do not publish their birth records, or only when the birth is anterior to a certain date (typically 80 to 100 years back). However, it is sometimes possible for a citizen to access manual records, usually at no cost. In England, volunteers transcribe birth records and put them online (see example). Needless to say, birth records are the mother of all sources.
Example: https://www.freebmd.org.uk
Suggested queries: "online birth records germany", "public birth records Sri Lanka".
Voters’ records
To our knowledge, voters’ records are public in a very limited number of countries (USA, India, some small islands). If the cost is reasonable, we will reimburse you for any administrative fee you might pay to your local authorities in order to obtain these records. The information usually includes the full identity (very useful in countries that use middle names), the sex and the date of birth (sometimes only the year of birth). When voters records are not public, the list of absentee voters sometimes is.
Example: https://www.alabamainteractive.org/sos/voter/voterWelcome.action
Suggested queries: "voters records paraguay", "electoral rolls thailand", "voters list belgium", "list of absentee voters".
School exams
One of the greatest sources as it involves young people and uses official names (no nicknames). Even when the date of birth is not provided, it is usually possible to estimate it give or take 2 years. Lots of countries hold national exams at the end of high school and sometimes middle school.
Example: http://etudiant.aujourdhui.fr/etudiant/resultats/bac/aix-marseille/470-ES/fr/2019.html
Suggested queries: "baccalaureat results france", "A level results bangladesh".
High school graduations
Countries that don’t have national school exams may have some sort of end-of-school graduations. Lists of graduates are sometimes published on the school’s website or in the local newspaper.
Example: https://www.psdschools.org/node/1539
Suggested queries: "Jefferson High School graduates list", "Iowa High School graduation program".
University graduations
Colleges and universities around the world typically have some form of graduation ceremony. The age of graduates can usually be estimated with a two-year confidence margin (meaning plus or minus two years).
Example: https://commencement.missouri.edu/commencement/programs/
Suggested queries: "Penn state commencement program", "Tokyo university graduates list", "list of bachelor graduates", "list of masters graduates", "list of PhD graduates", etc.
Dean’s or President's lists
In some countries, dean's lists or president's lists honor the best university students. In large US colleges, the list can contain as many as 10,000 names!
Example: https://chemistry.berkeley.edu/ugrad/current-students/deans-list/f-2020
Suggested queries: "Berlin University best students", "McGill University president's list", "Wellington College dean's list", "Chemistry Department's president's list", etc.
Faculty
Large universities often publish lists of the thousands of professors and administrative personel. High schools sometimes do the same, with smaller numbers though.
Example: https://bhs.berkeleyschools.net/information/staff/
Suggested queries: "Faculty Stanford University", "List of teachers Jefferson High School", "Berkeley University directory", "List of professors English department", etc.
Candidates running for office
In France, nearly 1% of the entire population was a candidate in one capacity or the other at the recent municipal elections. Presidential elections will only attract 10 candidates or so, municipal ones can attract hundreds of thousands.
Example: https://www.data.gouv.fr/fr/datasets/elections-municipales-2020-candidatures-au-1er-tour/#_
Suggested queries: "candidates municipal elections 2020", "candidates regional elections 2019".
Political donors
Some countries publish the list of individual who contribute to political campaigns. In the US, it represents about 1% of the population.
Example: https://www.fec.gov/data/receipts/individual-contributions/
Suggested queries: "list political contributors", "list political donations".
Members of political parties, members of unions
Political parties that host primary elections sometimes make the list of their members public. This is not frequent outside of the United States.
Decorations and awards
Every country has its own set of awards and military decorations. Recipients' names are often public. Do not forget to remove posthumous recipients off the lists.
Suggested queries: "Purple Heart recipients", "liste chevaliers Légion d'honneur", "list Italian Order of Merit", etc.
Inmates list
Some countries or states publish the list of their inmates. Just like voters’ records, names are presented in an official format. This avenue shouldn’t be ignored as inmates are often underrepresented in other sources (few vote or have graduated from college). US states also publish separate lists of sex offenders.
Example: https://appgateway.drc.ohio.gov/OffenderSearch/Search/SearchResults
Suggested queries: "list inmates Arkansas", "inmate locator New York".
Administrative exams
States and local authorities recruit lots of agents through dedicated exams, the results of which are generally public. To increase your chances, look for a list of all candidates rather than the list of the happy few who got admitted.
Example: https://www.zra.org.zm/wp-content/uploads/2021/03/List-of-candidates-for-the-ZRA-assessment-test.pdf
Suggested queries: "police constable exam Maharashtra", "administrative assistant exam candidates Northeast region".
Professional exams
Some countries publish the list of people who passed the bar exam, got their medical doctor’s degree, obtained their realtor license, etc.
Example: https://www.cnb.avocat.fr/fr/actualites/resultats-dadmission-lexamen-dentree-au-crfpa-session-2020
Suggested queries: "results librarian exam", "police officers exam candidates", "bar exam results Spain", etc.
Professional directories
In many countries, professions such as doctors, lawyers or nurses publish a list of their members, either at a local or national level.
Example: https://www.acfas.org/directory/default.aspx?search=a&sort=LAST_NAME
Suggested queries: "licensed doctors Nicaragua", "directory doctors Austria", "list of licensed attorneys South Africa", "list of licensed yoga instructors Hong Kong".
Sports federations
The French Federation of Table Tennis lists no fewer than 50,000 male and 10,000 female players! Of course, individual sports lend themselves to rankings more than team sports.
Keep in mind that we have already tapped most international rankings, so chances are we already have the top 50 Japanese tennis players or the 20 top female Australian swimmers. What we’re looking for are the large contingents of amateurs or semi-professional athletes. Another benefit of sports is that young people are overrepresented in them. They’re often listed by age group, which allows us to guess their year of birth within one year. Two sports are especially worth looking into: swimming and track & field. But don’t forget tennis, table tennis, squash, badminton, racquetball, fencing, triathlon, golf, gymnastics, ice skating, rock climbing, judo, karate, jumping, sailing, shooting, archery, pool, billiards, bowling, darts...
If you don't find any ranking pertaining to a sport, go to that sport's federation website. You'll likely find competitions' results, names of record holders, etc. One last bit of advice: because of the Covid pandemic, there were much fewer sport competitions in 2020. It is therefore likely that the 2019 ranking (if still available) will include more names than the 2020 ranking.
Example: http://www.fftt.com/site/competition/classement/classement-national
Suggested queries: "ski ranking Sweden", "Australia swimming best times 2019", "list of athletes gymnastics Russia", etc.
Races, including marathons
This is a category in itself. 10Ks and marathons can attract tens of thousands of participants, the list of which is often available online. We have covered the large international marathons, but you should explore national and regional races. You will not be disappointed.
Example: https://www.bmw-berlin-marathon.com/en/impressions/statistics-and-history/results-archive/
Suggested queries: "marathons Italy 2019", "10K races Indonesia".
Mind sports
Example: https://www.worldcubeassociation.org/results/rankings/333/single
One shouldn’t forget mind sports. The world chess federation lists almost a million players but national federations have many more. You can also look into bridge, checkers, backgammon, go, mah-jong, etc.
Once you've covered all the above categories...
Try less structured queries, such as "download list of participants", "list of attendees pdf", "member list", "members directory", "list conference participants", "list of female candidates", "list of students melbourne", "exam results montana", "examination results london", etc. You will not come back empty-handed.
If you find another category
By all means, let us know! We'll make sure it benefits all the other volunteers.