A/B testing for the masses. Here's an example: Say a someone is deciding which s...

travisfischer · on May 4, 2014

This is actually pretty interesting. I think many people have thought of a product along these lines and, as mentioned by others, there have been several versions of this tried.

I think the key is in the constraints you have outlined.

Obviously it's a mobile app. You take two pics and post. You then vote on other's ABs. Every vote you make on someone else's AB earns you a vote on your own. If you want a bunch of opinions on your AB you keep voting on other's. You can post to your social networks to get your friend's opinions as well. Maybe it ties into the social APIs and reads comments to extract out votes for A or B. Maybe some viral growth potential there.

I like this. I might use actually use it. Wondering if it is important to be able to choose a target segment of voters for your own AB or if the classification of "human" is good enough in most cases.

Fizzer · on May 4, 2014

I've thought about segmenting as well. For example, teenagers don't want to rate pictures of grandmothers all day -- they want to see people their own age. You could segment it by age and region for starters.

wingerlang · on May 4, 2014

I mentioned this idea to a client some time ago, exactly like this. Each vote gives you a vote on your own pictures. It was not in the scope of the product but I do still like the idea.

ctb9 · on May 4, 2014

There is a strong assumption here that you could encourage hundreds of people to express their opinion on this someone's shirts.

The value for the poster is clear; what is the value for the voter?

Fizzer · on May 4, 2014

Every time someone uses the service, they spend 30 seconds rating other people's images. There can be a countdown (i.e. rate 10 more to see your results).

It would also be important to make it very fast. Cache all the images before the voter gets there, and send the results asynchronously so they don't need to wait. One could easily do a rating in less than a second. It's all snap judgements.

utunga · on May 4, 2014

I'm just some guy, you know? But this sounds awesome to me. Maybe the trick, compared with Fashism (really?), which seems to have been about feedback on your Fashion more generally and pickfu which seems to address feedback more generally, the trick would be to really narrow it down.

To just an app for two photos, which is better, and let people make of it what they will. Would need decent facebook integration, so you can ask your friends, not just random strangers.

lpolovets · on May 4, 2014

One of my friends started a website that mostly fits what you're describing: http://www.pickfu.com/

Fizzer · on May 4, 2014

It does seem somewhat similar. However Pickfu is going for written feedback from a one (or a small number) of people, whereas the A/B testing idea would simply compare two photos and aggregate the results within seconds.

Pickfu also charges for their service, which I don't think would work for someone wanting to simply choose their wardrobe. It'd have to monetize in other ways (ads, or charging for getting gender/age breakdowns of the results, more results, allowing to compare more than 2 photos, etc.)

fudged71 · on May 4, 2014

>within seconds

This is the part I don't understand. You would need to have a whole lot of scale and incentive for people to give their feedback this fast and often. Jelly can take a while to get a response, for instance.

Fizzer · on May 4, 2014

Every time you upload your own photos, you are forced to rate other people's photos before you get your own response. This is the incentive.

Rating photos is very fast -- just look at two photos and tap one. All network lag can be eliminated by pre-caching the images. I'd bet the average rating speed is one per second (it's all snap judgements).

As long as the app has 30+ people using it at every moment of the day, you'll always get a response of at least 30 responses within 30 seconds. Plus I suspect people would enjoy voting and would spend time doing it even while not waiting on their own response, meaning you'd get more than you put in.

User9821 · on May 4, 2014

You don't want to force people to do anything, otherwise it skews the results. Force someone to vote for 30 seconds, or to vote 10 times, and they'll just sit there tapping the first picture until they reach their target. You'll get lots of votes, they'll just be useless.

I think you'd get enough votes if the app was in the right format. It would have to be like hot or not, where you have 2 photos, pick one, and instantly get another 2 photos along with the results from the first set. This gets people trapped in the just one more click mindset.

Focus it on fashion and clothing, people will vote to see more photos, to judge the outfits of others (a lot of people enjoy doing this for fun), and to get inspiration for themselves.

Also, allow users to select what gender and age groups can vote on their photo.

Fizzer · on May 4, 2014

they'll just sit there tapping the first picture until they reach their target. You'll get lots of votes, they'll just be useless.

Some people would do this, but as long as you randomize the order of the pictures their data won't alter the winner since each image would get an even number of bad votes.

You can also detect when people are doing this and start throwing out their votes, or just don't even prompt them to vote and show them ads instead. I'm pretty sure this would be a minority, since most people would understand that they want to get real results for their own photos, so they need to give real votes to other photos.

lgas · on May 4, 2014

You could offset the "tap the first picture" effect by e.g. randomizing the order the pictures are shown, weighting the votes of serial "top clickers" less, etc.

User9821 · on May 4, 2014

Then you're just applying random votes, instead of actual votes.

For example, something that should be 15-2 votes, ends up being 95-82. You'll be applying a large number of votes to both sides, and pushing everything towards a 50/50 rating. This doesn't help anyone, the goal is A/B testing, and you're making it more difficult to get accurate data. 15-2 shows a lot of promise for A, but then you add 80 random votes to both sides, and 95-82 seems like a tie.

lgas · on May 5, 2014

Well, I didn't really outline a specific approach, I was just trying to suggest that there are techniques you could apply to the data after collection to solve the problem without modifying the fundamental premise of the app. For example if you "weighted the votes of the serial top clickers less" in addition to randomizing the order of the pictures shown, then when all of your serial top clicker votes count as 0.1 instead of 1 then suddenly your 95-82 might move closer to the 15-2 and leave you with something like 25-12 which isn't as clear as 15-2 but is clearly significant.

Instead of showing the user 95-82, or 25-12 you tell the user "others prefered shirt A 2-to-1 or shirt b 66% to 33% or whatever might be appropriate.

Anyway, again, I'n not suggesting any particular techniques are the right ones, just that there (almost certainly) viable techniques available to mitigate the problem.

Fizzer · on May 4, 2014

For example, something that should be 15-2 votes, ends up being 95-82.

It's all in the presentation. If you just highlight one image and stamp "WINNER" next to it, most people won't even look at the numbers. Crowning a winner is more important than being scientifically accurate.

fudged71 · on May 5, 2014

>Then you're just applying random votes, instead of actual votes.

I suppose the question then becomes: will the end user notice or care if the votes are random? Do the votes need to come from humans at all?

minimaxir · on May 4, 2014

There was a startup that followed your example called Fashism. (yes, really)

It received venture capital, but died a painful death.

http://www.crunchbase.com/organization/fashism

Fizzer · on May 4, 2014

Interesting. Sounds like it does share some ideas, but with a different goal. This blurb says it was trying to get people to complete for a spot on a leaderboard, whereas A/B testing is about helping people make every-day decisions.

minimaxir · on May 4, 2014

I think the leaderboard was the original idea, but then they later pivoted.

http://betabeat.com/2013/09/fashism-struts-off-to-startup-gr...

Fizzer · on May 4, 2014

That's closer! But the devil is in the details. They have you looking at one image, reading the question, and rating up/down and even commenting. In the image, they have a full body shot, with simply "Does this work?" Does what work? The glasses, the shoes, the dress?

This takes a lot more effort than simply tapping one of two photos without reading anything. A/B responses can be given in less than a second. It's all snap judgements -- no reading is necessary.

sebastianconcpt · on May 4, 2014

Disturbing naming ethos. Not even Demi Moore charming backing them up could stand that.

Suddenly... #HopeInHumanityRestored

eurleif · on May 4, 2014

It's limited to pictures of people, but MyBestFace sounds like what you're describing. https://www.okcupid.com/mybestface

Fizzer · on May 4, 2014

You're right, the methodology is similar, just for different goals. That one is trying to figure out which picture to use for your profile, whereas this app is trying to help you in making every-day decisions such as what to wear or what to buy. A key element is the response time -- I'm not sure how fast this service gets you an answer but I'm assuming it's not within seconds.

eurleif · on May 4, 2014

Well, their slow response time is presumably because that's the best they can do.... how would you improve on it?

Fizzer · on May 4, 2014

The idea as I presented it would require a certain amount of traction to get started. It only would take about a second to look at two photos and tap one, so assuming 30+ people are using it at any given moment, you could get 30 votes within 30 seconds.

DanBC · on May 4, 2014

Mybestface uses a really small number of responses.

I'd pay to use something that gave me a sample of 500 or 1000 responses.

_wk3u · on May 4, 2014

There (was) an app for that. They were around for a few years based out of NYC but it looks like they didn't get much traction.

http://gotryiton.com/

wdrevno · on May 4, 2014

This sounds a lot like what http://thumb.it/ does. It works pretty well actually and the responses are within a couple minutes

Fizzer · on May 4, 2014

It does sound similar. Just from watching the video, it looks like thumb.it just has up/down votes on one image, with optional additional feedback.

The advantage of having two images is the voter doesn't even need to read a description or know what part of the image they're supposed to up/down vote. They should be able to easily see the difference between the two and just make a snap decision in less than a second. Not having to read text makes it more fun, I would think, and also provides a lot more results for the picture-taker.

KnightHawk3 · on May 4, 2014

What happens if both of them are told they are bad? (or you are just not very attractive)

Fizzer · on May 4, 2014

Voters just select which picture is better, so the only thing being tested is the difference between the two photos.

_qhtn · on May 4, 2014

Eating two turds is worse than eating one turd. I'd rather not eat any turds at all.