How many couples of the same name are there?

TL:DR: probably around 10k couples in the US have the same name and 60% of them are gay men.

Welcome to the first lab output of Twinkdestroyer Analytics. While we were touching grass in upstate New York my buddy Anita said, "I wonder how many couples have the same names." As the Bible said, when the women asks, the men shall answer. We can absolutely estimate this.

First, I downloaded the 1996-2017 NYC Marriage Data from here. This data has first, middle, last names of bride and groom, as well as all same-sex marriages after June 2011.

Looking at the couples with the same name from all the data, we have that 0.04% of all couples have the same name. That's 728 couples out of 1.6 million marriages, and a lot of these names are ethnic (e.g. Chinese transliterations) which means that's not a good estimate at all.

I feel like it's reasonable to assume that gay couples are more likely to have the same names. So I only looked at the data starting 2011 (which includes gay marriages).

I wanted to do some mid-level NLP bullshit to make ethnically informed guesses about the gender of bride and groom, but I soon realized that is overkill. (And that I have no compute.) Instead, I installed a good enough package called gender_guesser and called it a day. I looked at the first and middle name of bride and groom separately, and wrote a small program to label marriages as gay, straight, lesbian, or unknown.

Although I couldn't classify 30% of my data (again ethnic names make it hard), I found that 51565 out of 590097 marriages between 2011-2017 are gay/lesbian. That's 8.74%, which is so much higher than the nationwide stat that 1.2% of all marriages are same-sex. But just when I thought my good-enough regex classification scheme wasn't good enough, I found this article that told me 9% of all marriages in NYC are gay. Lol

After getting rid of couples I couldn't classify as gay or straight, I found that a total of 0 straight couples have the same names. This is obviously not true (probably a Sharon&Sharon/Lee&Lee marriage out there, for example) but for our purposes it's good enough to say 0% of straight couples have the same names.

% of gay/lesbian marriages: 51565 / 590097 = 8.74% 
% of gay marriage w same names: 332 / 31203 = 1.06% 
% of lesbian marriage w same names: 130 / 20362 = 0.64%

According to Williams Institute there are 1.3 million same sex couples in the us, 46.13% are gay, 53.87% are lesbian. Now we can calculate 0.0106 * 0.4613 * 1300000 + 0.0064 * 0.5387 * 1300000 = 10852 couples. There are probably 10k couples in the US with the same name and over 60% of them are gay men.

To close off, here are the top 20 names for same-name couples in my dataset:

newplot.png

Thank you for reading my in-house data analysis that could have been done by a 11-year-old from Shandong. Stay tuned for more twinkdestroyer analytics.