The Population Project has temporarily suspended its activities. Our database of about 700 million and 200 million dead remains freely accessible. Read more.

Reimporting US records

Three months ago, realizing we had way too many duplicates in our database, we stopped importing lists to take a closer look at our algorithms. We now believe we have fixed the most glaring problems. After wiping out the entire database, we’ve started reimporting US records.

AB

Antoine Bello · April 7, 2025

Three months ago, realizing we had way too many duplicates in our database, we stopped importing lists to take a closer look at our algorithms. We now believe we have fixed the most glaring problems. After wiping out the entire database, we’ve started reimporting US records.

What have we learned?

First, we put a greater emphasis on dates. We used to accept long lists of names with no birth-related info, thinking they formed a great frame to later be fleshed out with details. The trouble is most names are very common. How do you know if “Roger Brown born in Atlanta” and “Roger Philip Brown” are the same person? Add the date of birth to the equation and the probability that they are is multiplied by 25,000. This has allowed for lots of records to be merged. Most lists available online do not include DOB, but you’d be surprised how many we’ve been able to find.

Second, we no longer infer the year of birth. When processing the results of the French baccalauréat, for instance, we used to infer that most candidates were between 17 and 21 years old, or, in our parlance: 19 +/- 2. We’ve dropped this practice because a mere 5% of candidates outside of these bounds translated into tens if not hundreds of thousand errors in our base. The only exception we make is to deduce the birth year of a deceased person based on their age. If someone died in 2022 at the age of 72, we can say with absolute certainty that they were born in 1949 or 1950.

Third, we stay away from unofficial lists. They simply present too many risks, from the use of nicknames (“Bill” for “William”) to truncated names (“Jose Michel Gonzal”) to their limited size (some are simply too short to make them worth our while). In a nutshell:

  • We love electoral and vital records, obituaries, exam results, lists of social aid recipients and certified professionals.

  • We like professional elections colleges, lists of sports federation members and college graduates.

  • We are wary of marathon results, trade show and conference attendants, political donors and delinquent taxpayers.

Of course this is all relative. While we can afford to be picky in Western countries, we might have to lower our standards in Africa or Asia.

Fourth, we have improved our geo system. Each location - whether a city, region or country - has a unique ID, whose design reflects the hierarchical geo chain. We now know just by looking at their IDs that a person born in Boston and an homonym born in Massachusetts can actually be the same person.

Last, we’ve put in place stringent mechanisms to detect typos. In the past, when confronted with a “Frederack Smith”, we would have thought: “We don’t know this first name. Let’s add it to our list and import this person.” We now hold on to the record, until we encounter more Frederacks. Past a certain number of occurrences, we create the records that were on hold. Otherwise, we delete them. This system has helped us identify scores of bad records, with first names such as “MichaelT” or “CarolM” that were in fact “Michael T” and “Carol M”.

The first three countries we plan to import are the United States, the United Kingdom and France.

We will also announce new important functionalities soon.

Thank you for your continued support. We once compared the Population Project to a marathon. It looks more and more like some insane ultra-running race!

A nonprofit organization striving to compile a list of every living person’s full name and place and date of birth.
The Population Project relies heavily on the work and contributions of volunteers. We believe that information-gathering and use should go hand-in-hand with transparency. This Privacy Policy explains how the Population Project, the non-profit organization that hosts this site, collects, uses, and shares information we receive from you through your use of the Population Project Site. It is essential to understand that, by using the Population Project Site, you consent to the collection, transfer, processing, storage, disclosure, and use of your information as described in this Privacy Policy. That means that reading this Policy carefully is important. We believe that you shouldn't have to provide nonpublic Personal Information to participate to the Population Project. You do not have to provide things like your real name, address, or country to sign up for a standard account or contribute content to the Population Project Site. We do not sell or rent your Personal Information, nor do we give it to others to sell you anything. We use it to figure out how to make the Population Project Site more engaging and accessible. Put simply: we use this information to make the Population Project Site better for you.