Blog

Acquisition Coverage

The news of Automattic’s acquisition of Lean Domain Search spread faster and wider than I ever would have imagined.

Here are a few of the news outlets that covered it:

Thank you all for your support!

Lean Domain Search Acquired by Automattic

I’m thrilled to announce that Lean Domain Search has been acquired by Automattic, the company behind WordPress.com!

At Automattic, I’ll be working full time on making it even easier for WordPress.com users to find and register great domain names for their websites and blogs.

What does this mean for Lean Domain Search? Not only will it continue to run, but it’s also now completely free to use. There is even now an option in Lean Domain Search to register domain names directly through WordPress.com, making it easier than ever to start a website with a great domain name.

I’d like to extend a special thanks to my wife, my family and friends, the Orlando tech community, and Lean Domain Search’s users for their tireless support and feedback. Lean Domain Search wouldn’t be what it is without all of you.

If you’re new to Lean Domain Search and want to check it out for yourself, you’re welcome to head on over to the homepage to perform your own search. Cheers!

Lean Domain Search Users Have Now Performed Over 500,000 Searches

I’m happy to announce that Lean Domain Search performed its 500,000th search today. That’s not a huge milestone in the grand scheme of things, but certainly, one that I’m proud of. Thank you all for continuing to use it, share it, and for providing feedback and encouragement along the way.

Performance improvements update

Over the past few months a few folks have emailed me to report slow search times. I’ve spent the last few weeks trying to optimize every aspect of Lean Domain Search’s domain name generation process and am happy to report that it has paid off.

At the end of April the average search took over six seconds; it now takes less than two:

Put another way, at the end of April only about 32% of searches took less than three seconds. That number is now around 86%:

A lot of data

There were a number of different areas that were improved, but there’s one in particular that I’d like to talk about because some of the developers out there might find it helpful for their own applications.

When you perform a search on Lean Domain Search, it executes a JSONP request to a separate server to generate available domain names based on your search term. If the request times out or throws an error, I track that event via MixPanel. By looking at the number of search errors and segmenting by operating system, I noticed an interesting trend:

The majority of errors were occurring for Windows users. Why though? My first instinct was that it must be an issue with Internet Explorer. Internet Explorer has a reputation for being the source of cross-browser compatibility issues so it makes sense that it would be the cause of these problems. However, segmenting the search errors by browser type showed that that was not the case:

In reality, Chrome and Firefox users saw encountered more errors as percentage of the total than Internet Explorer. If it’s not the browser that’s causing the issue for Windows users then what was it?

An Inside Look at Lean Domain Search’s Brandable Domain Name Generation Algorithm

At the end of March, I launched Lean Domain Search’s new Brandable Domain Names section. Brandable domain names, for those of you not familiar with the term, are domain names that can be used for a wide variety of websites. Think names like Obsera and Innoviza. The names don’t convey the site’s purpose so they can branded to use for pretty much anything. Since the launch over 1,000 brandable domain names have been released at a rate of 1 per hour. At the time of this writing, almost 20% of them have been registered.

I’ve received a few emails about how I generate these domain names so I figured I’d write up a short blog post explaining the process. It’s slightly complicated, but hopefully by the end you will have a pretty good idea for how it works. This tutorial won’t contain any code, though you are free to implement the algorithm on your own if you’d like to experiment with it.

How it Works

The key to generating good brandable domain names is to ensure that they are pronounceable. This is easier said than done though. If you throw a bunch of letters to together randomly you’ll more than likely wind up with something that is entirely unpronounceable. What you need is some list of letter combinations that can be pronounced easily. While not comprehensive, a standard English dictionary is a great place to start.

What if we took every English word that ends in US and replaced the US with an A?

The English word list might look like this:

wordlist. abacus abstemious acanthus acrimonious adieus adulterous advantageous adventurous airbus alumnus ambidextrous ambiguous

Replacing the trailing US’s with A’s changes the names to:

wordlist. abaca abstemioa acantha acrimonioa adiea adulteroa advantageoa adventuroa airba alumna ambidextroa ambiguoa

By generating domain names based on English words that we already know are pronounceable, we’ve managed to generate a list of domain name ideas that are also mostly pronounceable. There are some exceptions: if the original word ended in IUS or OUS then it now ends in IOA or OA which is not very pronounceable, but we can add some rules that say to disregard those when coming up with the actual list.

Using a dictionary of common English words is a good start, but it’s somewhat limited. In the list of common English words that I am using, there are 602 words that end in US. Not bad, but not a huge number either. Is there somewhere else we can look?

Using the Zone File for Inspriation

Enter VeriSign’s zone file. This list, published daily, contains most of the registered .com domain names in existence. With over 100 million domain names and counting, this is an invaluable resource for generating domain name ideas. For every registered domain name out there, someone decided that it was good enough to pay money for which means that it is more likely than not to be pronounceable. By looking at existing domain names and making slight modifications we can generate domain names of our own that are also likely producible. There are a few things we need to do first though.

The zone file that I am using contains 609,888 .com domain names that end in US. If all we did was replace the trailing US with an A for all of those domain names you’d wind up with a pretty bad list of domain name ideas. For example, the original list would contain domain names such as:

  • telcoplus
  • andreassophocleous
  • spyralplus
  • ourscampus
  • damaus
  • tutututus
  • unique2us
  • guerillamarketingplus
  • lifehaus
  • faziosworldfamous
  • janjakstatkus
  • bookious

Replacing the trailing US with A’s results in:

  • telcopla
  • andreassophocleoa
  • spyralpla
  • ourscampa
  • damaa
  • tutututa
  • unique2a
  • guerillamarketingpla
  • lifehaa
  • faziosworldfamoa
  • janjakstatka
  • bookioa

Not a bad start, but there are still a lot of problems: some are not pronounceable, some contain numbers, and others contain words that make them unusable as company names (guerillamarketingpla will be read as Guerilla Marketing Pla, for example — a name no one would want to use). Some of these issues can be mitigated with various rules: only look at domain names of a certain length, ignore ones with numbers and dashes, etc, but you still have a problem that some convey meaning. ourcampa would pass all of our rules, but it’s still not a good domain name. What to do?

The Importance of Common Roots

The root of a domain name is the part of the domain name that does not contain its suffix or prefix. For example, in a domain name like github, hub is the suffix and git is the root. In a domain name like ourcampus, our is the prefix and campus is the root. What if instead we said that ourcamp is the root and us is the suffix? That’s not what most of would consider the root and suffix, but bear with me for a second.

What if you looked at all of the roots for domain names that end in US and compared it to the roots of all domain names that end in, say, IS. By looking at the roots that they have in common, we’ll likely end up with a list of pretty good list of pronounceable roots. And because the roots are registered with multiple suffixes, there’s a good chance that it doesn’t have an actual meaning (for example, guerillamarketingplus would be a result for US, but guerillamarketingplis with an IS is unlikely to be registered so guerillamarketingpl wouldn’t make the list of common roots).

There are 609,888 .com domain names that end in US which means there are 609,688 roots for those domain names. There are 540,887 .com domain names that end with IS and therefore 540,887 roots. Of those there are 26,782 roots in common. By adding a new suffix such as A to these common roots, you wind up with some pretty good domain name ideas. If you restrict it to results that are between 5 and 9 characters (all 1, 2, 3, and 4 letter .coms are registered and 10 or more for a brandable name tends to be too long), remove all of the names that contain numbers and dashes, and apply a few rules (no domain names that end in UA, YA, ZA, etc) you can reduce the list to a mere 10,021 domain name ideas. These domains include:

  • sitella
  • vizala
  • atoura
  • applieda
  • mymana
  • mirsala
  • greva
  • igena
  • ideasa
  • sponsora
  • latra
  • spirala

To drive the point home: if you replace the trailing A with a US or an IS then those domain names are registered. For example, the presence of Vizala means that Vizalus and Vizalis are registered.

At this point we have a list of domain name ideas but we haven’t checked to see which are still available to register. After running them through an availability checking script, we’re left with 1,211 domain names. These include:

  • quantila
  • arvenda
  • prodova
  • netixa
  • innovira
  • holdena
  • cypera
  • vangela
  • relatora
  • primorda
  • tacticala
  • ubiquida

Not bad, right?

The final step for me is to manually review these results. As good as this method is at generating available brandable domain names, it still comes up with some bad names so I wind up reviewing the results and selecting which ones to add to Lean Domain Search.

By playing with which suffixes it checks the roots for and what new suffixes to add to the common roots, this algorithm can be used to generate thousands of great available domain names.

Summary

To recap, here’s how the algorithm works:

  1. Determine all of the registered .com domain names that end with specific suffixes (US and IS in this case)
  2. Figure out the roots for those domain names and determine which ones they have in common
  3. Add a new suffix to those roots (A in this case)
  4. Programmatically remove domain names that indicate low quality (numbers, dashes, length, certain letter combinations, etc)
  5. Check which of those domain names are available
  6. Manually review the results for quality

If you have any questions, feedback, or ideas on how to improve it, please feel free to leave a comment or email us at help@wordpress.com.

Lean Domain Search and the Coming Change in Domain Name Search

DomainNameWire published a really interesting article last week titled The Coming Change in Domain Name Search where Andrew Allemann talks about the impact of gTLDs on domain search:

This year will truly be the end of an era. Saying it’s the end of an era because of the elimination of the drop down box, a staple of domain search for over a decade, misses the point. The bigger story is a move from “dumb search” to “smart search”. Most registrars currently have basic search systems. Give them a query, and they’ll respond with the match. That simply won’t cut it going forward. Smart domain search is a big, complex problem.

For a practical example, consider a pizza shop owner looking for a domain name:

Perhaps it makes sense to show johns.pizza, johns.restaurant, johnspizza.restaurant, or johnspizza.food to this customer. Showing these top level domains to most consumers, such as those that search for generic terms, doesn’t make sense. Registrars will need to determine when it makes sense to show a new top level domain ahead of .com in the search results. In the case of a search for “Johns Pizza Shop”, showing JohnsPizzaShop.com makes no sense because it is already registered. Unless it’s for sale on the aftermarket, the registrar won’t by able to convert a registration for the domain. Search results may include other .com domains, but registrars will need to analyze their data to determine exactly what to promote for each individual search.

Lean Domain Search is in a good position to help bring order to the coming chaos, but the real question is whether or not it makes sense to do so.

In the past I experimented with including a link to the .net and .org search results for your search (the default is .com) and only about 2% of visitors viewed them. Let me say that again because it’s worth emphasizing: When presented with the option of viewing potentially better .net and .org domain names, only about 1 in 50 visitors even bothered checking them out.

That begs the question, does it even make sense to include the results from the new gTLDs?

The answer might lie in a hybrid approach: rather that letting you view the full search results for other TLDs (.net, .org, .restaurant, etc) maybe I only display the exact-match domain names if they are available and I do that directly above the .com search results. If you search for “JohnsPizza”, for example, Lean Domain Search will return the standard .com search results, but it also checks whether JohnsPizza.net, JohnsPizza.org, JohnsPizza.food, etc are available.

It might look something like this:

And then maybe based on the search term, it could prioritize how to order the gTLD results. For example, if your search term includes something related to food, then show the food ones (.restaurant, .food, .pizza) above the tech ones (.app, .blog, etc).

It’s a hard problem and one that most of the major registrars are probably looking at right now. It will be interesting to see how it goes.

What do you think? Should Lean Domain Search include search results from the other TLDs or stay focused on .coms? I’d love to hear your thoughts in the comments below.

On the Performance of Premium Domain Names on Lean Domain Search

At the end of November 2012 I partnered with Sedo to include premium domain names in the Lean Domain Search search results.

Premium domain names — shown in orange below — have been displayed above the available search results so that you can consider them when making a decision on which domain name to buy for your website:

After three months of displaying premium domain names, here are the affiliate commission results:

To sum it up:

  • 751 people have clicked on one of the “Buy Now With Sedo” buttons.
  • Of those 751 visitors, 13 created a Sedo account indicating that they were serious about purchasing a premium domain name.
  • Of those 13 people who created an account, 0 completed a purchase.

Interestingly, I did get credit for two domain name transfers to the tune of €52 (about $68).

It’s worth noting that I filtered the domain names I received via Sedo’s search API to ensure that the quality of the domains was up to par. For example, if a user searched for “task” and the API returned taskhhh it wouldn’t be displayed on Lean Domain Search because hhh is not one of the suffixes that it recognizes. Domains like taskblog and taskhub would have been displayed because blog and hub are recognized suffixes. My point is that the quality of the domain names was good but still there were no takers.

Why is this? It’s an interesting question.

Given that people using Lean Domain Search are actively looking for a domain name you’d expect that at least some would like the premium domain names enough to buy one (after all, they are by definition supposed to be of higher quality than the available ones).

I have three theories:

1. The type of people who use an available domain name generator are not the type of people who want to pay for a premium domain name.

Lean Domain Search is extremely popular among the startup crowd. Maybe startup founders are more tech savvy and therefore more likely to use a domain name generator and therefore less likely to pay for a premium domain name. Or maybe they’re more frugal because they’re on a startup budget.

2. When displayed next to quality available domain names, the premium domain name prices don’t look quite so good.

TaskTime is a good domain name, but is it really worth $15,000 more than a domain name like TaskStyle? If you’re only comparing premium domain names it’s easier to convince yourself that an expensive domain name is worth it. But when you have a choice between a $15,000 premium domain name and a $9.99 available one, it’s less likely you’re going to buy the premium one.

3. There simply wasn’t a large enough sample size.

GoDaddy also partners with Sedo to display premium domain names in their search results:

And I’ve heard from several domain name investors that it does result in sales. But maybe the the reason they receive sales from GoDaddy and not Lean Domain Search is simply a matter of size. If Lean Domain Search gets 40,000 searches per month and GoDaddy gets 40,000 per day then obviously GoDaddy is going to generate more sales.

Because the premium domain names have not resulted in any sales and because it significantly slows down the search process (Lean Domain Search has to make two queries to Sedo’s API for every search, one for domains that begin with your search term and one for domains that end with your search term), I’ve decided to remove premium domain names from Lean Domain Search result pages.

When the results are removed in a few days Lean Domain Search will be significantly faster for new queries to the point where I might even be able to Ajaxify the results down the road (where you see results as you type).

If you do feel strongly about this one way or the other — or if you have any thoughts on why the premium domain names aren’t performing — I’d love to hear them in the comments below.

Why were 98 domain names beginning with LEVI-S registered in China on January 29th?

Two weeks ago I launched Domain Name Trends, a new tool on Lean Domain Search that lets you visualize how many .com domain names have been registered over time for a given topic. Along with the main search tool I also released hot topics feature which analyzes all of the .com domain name registrations on a given day and attempts to identify topics that saw a lot of registrations.

Most of the time the hot topics analysis identifies terms that domain name investors are in the process of speculating on. Recent terms include eurovegas, wedding venues and motorcycles for sale. On January 29th, however, the hot topics analysis tool identified a term that I had never seen before: LEVI-S.

As you can see from the search results LEVI-S saw an explosion in domain name registrations on January 29th:

What was so special about this term that led to so many domain name registrations that day? A quick Google search for levi-s doesn’t turn up any relevant results for that term. The sites themselves do not appear to be in use either.

Composition analysis

Looking over the actual domain names I noticed a perplexing similarity: Not only do all of the domain names begin with LEVI-S, but the next part of the domain name always was a single English word beginning with the letter F: LEVI-SFACT, LEVI-SFACULTY, LEVI-SFAIL, LEVI-SFAIR, LEVI-SFAITHFUL, etc:

Given the choice of words paired with LEVI-S, it was seeming less and less likely that whoever registered these domain names was doing it for speculation purposes. After all, if someone truly believed that LEVI-S was going to be a popular topic there are a lot better domain names they could have speculated on including LEVI-SONLINE, LEVI-SSPOT, LEVI-SHUB and so on.

Registrant analysis

If the domain names themselves weren’t of any help, maybe I determine their purpose based on who registered them. I wrote a quick script to check the WHOIS information, parse the results, and output it as a CSV file:

You can download the CSV file yourself here (2KB).

The 98 domains were registered by 37 unique individuals: 29 people registered 3 domain names, 4 registered 2 domain names, and 3 registered 1 domain name.

DomainTools has a service that lets you look up domain names given information about who registered them (a reverse WHOIS). Unfortunately the only domain names registered to these individuals appear to be the ones they registered here:

Moreover, all 37 registrants were located in China, though not in the same location:

Summary

To sum it up: 98 .com domain names beginning with LEVI-S and ending with an English word starting with the letter F were registered on January 29th by 37 individuals spread across China all of whom haven’t registered domain names in the past. The term LEVI-S has no obvious significance and the domain names do not appear to be in use.

I sent an email to the 37 email addresses listed in the WHOIS contact information asking for more details, but have not received a single response.

Is the Tech Community’s Preference for Namecheap a Sign of Things to Come?

Earlier this month I wrote a post about which domain registrars HackerNews users prefer. Among that audience, NameCheap came in first (35%), GoDaddy second (27%), and Gandi a distant third (6%). There was a lot of great follow-on discussion in the HackerNews comments about pros and cons of the various registrars that’s worth reading.

Several readers asked about the overall statistics, that is not just HackerNews visitors but everyone who has chosen a registrar on Lean Domain Search.

Here are those numbers from launch (Jan 16, 2012) through today (Dec 18, 2012):

45% of Lean Domain Search users who were prompted with the “Please select your preferred registrar” dialog chose GoDaddy as their registrar, followed by Namecheap (20%), Gandi (3%), and Dreamhost (3%). 17% selected “None” and the remaining 12% selected something else (all less than 2% of the total).

GoDaddy may be the most popular registrar, but HackerNews users are 75% more likely to use Namecheap than GoDaddy compared to the average Lean Domain Search visitor.

If the preferences of the tech community are an early indication of overall trends, Namecheap could pose a significant threat to GoDaddy in the coming years. Regardless of the outcome, the competition will force all registrars to step up their game and that is certainly a good thing for all of us.

20% Of The Queries on Lean Domain Search Account for 77% of The Searches

I’ve been spending a lot of time lately trying to improve the quality of Lean Domain Search by decreasing the number of registered domain names that appear in the available search results. After all, there’s nothing more frustrating than getting your hopes up because you found a great name only to find out that it has already been registered.

The causes of the false-positives (the registered domains that appear in the available search results) are complicated, but there are things I can do to mitigate it. The primary mechanism I have for improving the quality of the results is a script that I continuously run that double-checks that the available search results are accurate. If the script comes across an available domain name that is actually registered, it notifies Lean Domain Search so that that domain is not included in future search results.

The problem though is that the script is slow because it has to perform a WHOIS query for every domain that it needs to double-check. This yields itself to an interesting optimization problem: given that the script can only double-check so many results per day, which results do I check?

I could, for example, get a list of all the searches performed in the last few hours and then just go one by one through them and double-check the results. A better approach is to focus on the queries that people search for the most because inaccuracies in those results are going to affect more people than something that’s rarely searched for.

The question then becomes how many of the most popular queries should I have the script check?

By performing a little kung fu with the analytics data I can get a much better idea of how to allocate my resources:

Along the x-axis are the percentage of queries that it’s taking into account, on the y-axis is what percentage of the overall searches that those queries account for.

Some interesting results:

  • The top 10 queries (or 0.11% of all the queries) account for more than 4% of the searches performed
  • The top 1% of the queries account for 42% of the searches performed
  • The top 10% of queries account for 70% of the searches performed
  • The top 20% of the queries account for 77% of the searches performed (noted by the red lines in the chart)

This last result is particularly interesting because it conforms to the Pareto Principle, also known as the 80-20 rule, which says that for many events 80% of the effects come from 20% of the causes. Examples include 80% of the land in an area being owned by 20% of the population, 80% of a company’s sales coming from 20% of its customers, etc. In this case, this distribution follows suit: 77% of the searches come from 20% of the queries.

Using this information I can focus the script that double-checks the results on queries that are going to affect the most people which in turn is going to create a better experience and hopefully higher conversion rates.

Data is fun (and profitable!). 🙂

25% of All New Queries on Lean Domain Search Have Never Been Searched For Before

A few years back I came across an almost unbelievable statistic: between 20% and 25% of daily searches on Google have never been searched for before. Google has been around for almost 15 years at this point and gets a few billion searches per day so for more than 1 in 5 queries to be brand new — never searched before, ever — it’s really quite amazing.

I’ve been running Lean Domain Search for almost 11 months now and was curious to see how its numbers compared to Google’s.

I started by exporting a list of queries performed each day via MixPanel’s Ruby API. I then wrote a script to go day by day and check how many searches performed that day had never been performed before (either on that day or previous days). Casing and the presence of spaces were ignored so that “webdesign” and “Web Design” were treated the same.

Even though I knew Google’s 20%-25% number, I felt like things would be different. Google is a general purpose search engine; Lean Domain Search is a domain name generator. Surely the difference in purpose would be reflected in the numbers. Nope.

Here’s the daily average:

Here’s a weekly moving average:

On the day of the HackerNews launch (Jan 16, 2012) half of the queries had never been searched for before. Why half and not 100%? Because during day as more and more HackerNews visitors performed searches it became less and less likely that subsequent searches would be unique. By the end of the day 15,901 searches had been performed, but 7,954 were duplicates of other searches that had been performed that day.

Over the next month the number dropped from 49% new to 40% to 36% and then down to the mid-20s. What’s really amazing is that it’s hovered around that point ever since then despite the fact that more than 300,000 new searches have been performed since that time.

The overall daily average since launching is 24.75% meaning that 1 in every 4 searches has never been performed before. That was true in February, it’s true now, and if Google’s stats are any indication, it’s probably going to be true in a few years as well.