Monday, November 2, 2009

Matching Algorithm

Because I was the Acting CTO for eHarmony at it's start, I quite often get introduced to people who have an idea for a startup company that is based on some kind of matching algorithm.  They describe the company as the eHarmony of careers, clothes, jobs, college, tutoring services, doctors, service companies, investments, etc.  In fact, you can get a good idea of these various things by just searching on Google for "eHarmony of" startup.

Each of these startup ideas has at it's core a matching algorithm that reduces the friction between a consumer and some need.  In eHarmony's case, it was the friction of finding a compatible marriage partner.  In the case of college matching, it's the complexity of what college matches best with the person's needs.

As I've had more than a hundred conversations with entrepreneurs who are planning to build a company around a matching algorithm, I thought it would be worth capturing a few thoughts that commonly come up.

Margin in Mystery

An Angel investor that used to be on a startup CEO roundtable with me, always had a lot of great phrases that would help startups.  One of my favorites was, "There's margin in mystery."  What he meant by this is that anything that's too obvious to the consumer can be easily evaluated for it's value and often then suffers from low margins.  The flip side is that if you can offer something that's not at all clear how you are doing it, then people perceive greater value.

An example that relates directly to creating a matching algorithm, is the classic Myers-Briggs Type Inventory.  The experience is often that after you've answered a whole bunch of questions, it comes back with a description of you that seems eerily accurate.  For example, after taking a similar personality test, the person who came to give me feedback walked into the room with "Tony, you like going to bookstores don't you."  And I certainly do, but the test never asked me anything about bookstores.  The feedback had lots of other items that somewhat "nailed me."

Certainly eHarmony relies on this.  They come back with their free personality profile that nails you.  It gives you confidence that they understand you and what would make a good potential marriage partner.  And what they tell you about how you will be in a relationship do not directly come from any questions they asked you.  That feels powerful.  It's a bit mysterious.

Now, if the assessment that I took had asked me – "Do you like to go to bookstores?"  and the assessment echoed what I had answered to that question, then the algorithm is obvious.  There's no mystery.  And I will perceive lower value.

This is really important when it comes to a matching algorithm, because I'm often presented a matching algorithm that really isn't a matching algorithm at all.  It's really just a simple filtered search.  For example, if you are going to be building the best matching algorithm for high school students looking to find the right college, but it is based on criteria that are a search (geographical location, majors offered), no one is going to ascribe greater value.  You may have a perfectly fine business, but it's not going to be differentiated based on that simple algorithm.

Instead, what you need to have is something like leveraging personality characteristics of students who have been successful at that college, or maybe common life ambitions of students who say they are happy there, or ???

Requires Data

You will notice that the suggestions I made on how to increase the mystery in a matching algorithm also just created a need for data.  In order to successfully match students to colleges where they will like it and be successful, you probably should have a lot of data from students who have already attended that college, whether they liked it and were successful, and their personality profile, or life ambition or whatever you plan to use to match people.

A lot of people I talk to about their matching algorithm don't know that eHarmony (more specifically Neil Clark Warren) had years of scientific research that were the basis of his dimensions of compatibility.  These dimensions related to personality, values, likes/dislikes, etc.  And related to each of these dimensions they had been doing years of research to determine what combinations produced happy/unhappy marriages as well as how long the marriages lasted.  It was a lot of data that they was distilled down into a fairly complex matching algorithm.

Many of the startups that I talk to don't have any of that kind of data.  You can maybe find a researcher or something to use as a proxy.  You can make educated guesses.  You can start to collect the data as part of the system.  But without the foundation, you are likely going to have trouble creating something that will truly have mystery.

That said, I will admit that there have been a few ingenious entrepreneurs who had a matching algorithm that had mystery and no data yet appeared to be fairly valid and valuable.

Broad Value and Appeal

My guess is that if I discussed this with fellow computer scientists, they would argue that many of the ideas I'm calling a matching algorithm really are not strictly a matching algorithm.  While strictly speaking that's true, I think it's misses the broad value and appeal that this kind of algorithm provides.  Whether or not it's truly a matching algorithm, they have broad value and appeal because they apply anywhere that there's a fairly large sets of options, complexity to those options, and friction in reducing the set and dealing with the complexity.

This is why there are so many "eHarmony of" companies. 

And I actually think this is going to grow significantly!

If you think about what's going on with the web, we've reached a point where everyone and everything is connected, it's represented online, it's creating content.  The numbers are growing rapidly.  The amount of data we have about it is growing rapidly.  Our choices are growing.

Yet most of us are much happier with fewer choices.  Actually, that's not strictly accurate.  We are happier with a smaller list of well vetted, reasonable choices.

A matching algorithm is at the heart of how you deal with scale and complexity.

It comes up all the time. 

  • Who should I meet here in Los Angeles professionally that will be an interesting conversation and a potentially valuable contact?
  • What entrepreneurs / startup companies would benefit from talking to me?
  • Who should read my blog?  Who's blog should I read?

And, again, while the line is fuzzy, this also all relates to issues like social filtering and Curator Editor Research Opportunities on eLearning Learning.  The challenge is how you can help a consumer make sense of a large complex space.  This is only going to get more interesting.

And, because of the web, there are all kinds of new sources of the data that a matching algorithm can use.

6 comments:

Cliff Allen said...

It's interesting how much of everyday life depends on matching one thing with another.

I agree that simple filtering based on what someone has already told you doesn't have the mystique that a matching algorithm generates. In the early days of artificial intelligence the "expert systems" used an algorithm called an inference engine to convert a series of answers into intelligent-looking recommendations. They were intelligent recommendations because a human had figured out all of the recommendations ahead of time. Those systems were fun to play with, but hard to turn into practical applications.

When Amazon started tweaking the collaborative filtering personalization software from NetPerceptions, Amazon's programmers added mystique to book recommendations. Unfortunately, only a small number of e-commerce companies have been able to implement collaborative filtering to provide a good one-to-one marketing experience.

These days, it has become easy to collect a lot of behavioral data about people on the Internet. Like Amazon, the challenge for entrepreneurs is to turn behavioral data about activities like which products people buy and which events they attend, into appropriate recommendations for other products to buy and events to attend.

Tony Karrer said...

Cliff - great point about what Amazon is doing. Also Netflix and MyShape.

I should dust off my old Artificial Intelligence text books. I studied and built expert systems and inference engines way back when. I bet a lot of it still applies.

My guess is that in the domain of SureToMeet there are some interesting matching problems.

Anonymous said...


" ... years of scientific research that were the basis of his dimensions of compatibility ..."


Where is exactly that research ???

eHarmony is only supported by a big marketing budget and not by serious scientific evidence.

eHarmony DOES NOT HAVE any peer_reviewed Scientifc Paper by Academics (public scrutiny of findings) from different Universities showing eHarmony's matching algorithm can match prospective partners who will have more stable and satisfying relationships than couples matched by chance, astrological destiny, personal preferences, searching on one's own, or other technique as the control group.




"... data that they was distilled down into a fairly complex matching algorithm. ..."



fairly complex matching algorithm ???

eHarmony has been always the same:
1) Big5 to assess personality.
2) Dyadic Adjustment Scale (invented by Dr. Graham B. Spanier in 1976) to calculate compatibility (similarity) between prospective mates.
3) Guided Communication Process as an appendix of its main matching algorithm. The Guided Communication Process is a mutual filtering step.

You can use a 1982 Commodore 64 or ZX Spectrum computer to calculate Dyadic Adjustment Scale between prospective mates.


If eHarmony has 20,000,000 active members
and
suppose eHarmony is responsible of 300,000 marriages since 2001
and 700,000 dyads in long-term relationships.

eHarmony's Success Rate is only 2,000,000 persons / 20,000,000 == 0.1

eHarmony's Success Rate == 10%

90% of eHarmony's members are going to fail in finding someone highly compatible!

Is eHarmony only a BIG HOAX ?

----------------------------------------------------------------

Online Dating Sites can be classified as:

Online Dating 1.0: First Generation "Browsing/Searching Options, Powerful Searching Engine"

Online Dating 1.5: Hybrid; "Unidirectional Recommendation Engine", sites like HotorNot.

Online Dating 2.0: Second Generation "Matching based on Self-Reported Data / Bidirectional Recommendation Engine" e.g. PerfectMatch, uses an ipsative instrument based on MBTI test.

Online Dating 3.0: Third Generation "Compatibility Matching Algorithms" e.g. eHarmony, uses a normative version of the Big5 to assess personality and Dyadic Adjustment Scale to calculate compatibility. The U.S. questionnaire is different from the UK site, Australian site, Canadian site, etc.



However, eHarmony is in the range of 3 or 4 persons highly compatible (who select to each other) per 1,000 persons screened!!! so in a 10,000,000 persons database, one person will see 30,000 to 40,000 persons as highly compatible; 30,000 persons is the population of an average small city!!!
Any person can achieve 3 or 4 persons highly compatible per 1,000 persons screened in a big database by searching on one's own or by mutual filtering!


Actual Online Dating Sites == searching on one's own == mutual filtering / bidirectional recommendation engines == compatibility matching algorithms, they are all like piston engines,
in the range of 3 or 4 persons highly compatible (who select to each other) per 1,000 persons screened.



WorldWide, there are over 5,000 (five thousand) online dating sites, but:
No one is using the 16PF5 normative personality test, available in different languages to assess personality of their members, or a propietary test with exactly the same traits of the 16PF5
No one is using a quantized pattern comparison method (part of pattern recognition by cross-correlation) to calculate similarity between prospective mates.




Regards.
Fernando Ardenghi.
Buenos Aires.
Argentina.
ardenghifer@gmail.com

Tony Karrer said...

Fernando - thanks for the comments. Seems like you aren't happy with eHarmony to some extent. Not sure why.

eHarmony did come in with considerable research that formed the basis of the algorithm. I don't know if they ever published any of this.

I can't comment on what you claim the basis for the algorithm is, but I can tell you about general complexity and that you are missing a couple really important aspects of complexity.

The complexity here stems from the fact that this is not a simple 1-to-1 matching. A person with a particular profile doesn't just match onto people with the same profile. It's much more complex than that. It really is an algorithm. BTW, that's where the research plays into it.

The other aspect of this is that it has to be very scalable. They handle matching millions to millions every night.

You lost me on how you are defining success and the meaning of your stats. eHarmony from the beginning defined success based on the success of the resulting relationship/marriage.

Cliff Allen said...

Yes, we're looking at how to improve our event recommendations.

I'll bet your right that some "old" AI code could be used today. Processor speed was one problem with expert systems -- they couldn't keep up with a call center rep, etc.

A blog you might like to read is Geeking with Greg (http://glinden.blogspot.com/). He was at Amazon in the early years and later wrote a great personalized news site.

Anonymous said...

The entire Online Dating Industry for serious daters, in 1st World Countries, is a big hoax.
* No Legislation.
* No Quality Norms.
* Low reliable background checks.
* Divorce rates are still very, very high.
* No actual online dating site offering a compatibility matching method has published any Scientifc Paper peer reviewed by Academics (public scrutiny of findings) from different Universities showing its matching algorithm can match prospective partners who will have more stable and satisfying relationships than couples matched by chance, astrological destiny, personal preferences, searching on one's own, or other technique as the control group.
* Success rates of them are less than 10%. The majority of their members are not going to achieve a long term relationship with commitment (or marriage).
* All the algorithms used by eHarmony, True, Chemistry, PerfectMatch, Be2, Meetic, PlentyOfFish, Parship, RewardingLove, MyType, etc. are like placebo, because they will show, to any member, 3 or 4 persons as highly compatible per 1,000 persons screened, so in a 10,000,000 persons database, any member will see 30,000 to 40,000 members as highly compatible; 30,000 persons is the population of an average small city. Any person can achieve 3 or 4 persons as highly compatible per 1,000 persons screened, searching by his/her own or by mutual filtering methods.
* All the algorithms used by them are like piston engines, they can not break "the online dating sound barrier".

Breaking "the online dating sound barrier" is to achieve far better precision than searching on one's own or mutual filtering.


Breaking "the online dating sound barrier" is to achieve at least:
3 most compatible persons in a 100,000 persons database.
12 most compatible persons in a 1,000,000 persons database.
48 most compatible persons in a 10,000,000 persons database.


Latest Research in Theories of Romantic Relationships Development outlines: compatibility is all about a high level on personality similarity between prospective mates for long term mating with commitment.


The only way to achieve that is:

- using the 16PF5 normative personality test, available in different languages to assess personality of members, or a propietary test with exactly the same traits of the 16PF5. The ensemble of the 16PF5 is: 10E16, big number as All World Population is nearly 6.7 * 10E9

- to express compatibility with eight decimals, like The pattern 6.7.6.8.9.6.7.7.8.7.2.5.8.7.3.4 is 92.55033557% +/- 0.00000001% similar to the pattern 7.7.6.8.8.7.6.5.8.7.4.5.7.7.3.4




Regards,

Fernando Ardenghi.
Buenos Aires.
Argentina.
ardenghifer@gmail.com