Tony Jebera and Lee Dryburgh Discuss Sense Networks and Massive Telecom Opportunities

Towards the end of last month I found myself unexpectedly interviewing Tony Jebara of Columbia University and Sense Networks. It turned out to be something more conversational than an interview; the work of Sense Networks as articulated by Tony clearly got me excited.

If you work within an operator and don't feel bullish about the future, pay attention.

You can download it as a 96kbps MP3 here (45 meg, 1 hour 7 minutes).

Additionally the full transcript is below. To distinguish between us I've indented Irv.


Good afternoon, Tony.

Hi, Lee

How are you?

I'm doing very well, thank you.  And you?

I'm a little bit tired, because this is the third interview, today, and I think my throat is going to wear out and I'm going to become very over-caffeinated, and begin shaking, any time soon.

Okay, I'll pick up the slack if you want to ask short questions.  I can talk forever, on this stuff.

That sounds good to me.  Because of the lack of gaps, I never had time to prepare absolutely any questions for you, so my questions are going to be very elementary, to begin with.  We'll start with the most obvious; you are the associate professor at computer science at Columbia University and the Chief Scientist and Co-Founder at Sense Networks.  Sense Networks, for me personally, has a lot of interest for me.  Could you care to give an outline of what Sense Networks is doing?

Okay, great - Sense Networks was founded by Greg Skibiski, Alex Pentland, Christine Lemke, and myself.  We basically realized that cell phones and location data from cell phones is a wonderful new opportunity to get information about what people are doing, what they're interested in, and to either get this personalized or aggregate data, which enables all sorts of what we call "offline modeling". 

For the past decade or two, the online world has become a wonderful place to model what people are doing, what they're interested in, because it's all online when you're typing into your computer, or when you're declaring things on Facebook or on the website. 

What we wanted to do was say "Could we move this rich structure of networks, graphs, and connections, and analytics off of the online world and into the offline world?  We actually do spend a big chunk of our time there.  We forgot about that.

Because the mobile phone is always with us, even when it's not on, it's locating us, getting our location.  Even though we're not talking and giving actual communication information, we're revealing a lot about what we're doing in the offline world.  So, we view this as a wonderful bridge for getting real data in an unobtrusive way, about massive numbers of people, so we can do what the online world has done with the offline world.

For example, just like the online world has this wonderful network of virtual places called the Web, and ways of searching around these virtual places, we believe the phone is going to help us figure out the network of real places, and build a network or a web of the various restaurants, bars, and places you would go to, and to understand that network much the same way we're able to understand the network of the worldwide web, and to apply some of the same things we've been applying for various socially beneficial or commercially beneficial opportunities, that now work on the Web, but back in the real world, using this location sensor.

But this is - I hate to use the word "convergence", but this is the online world being applied to the offline world.  It may not be in your application example, but basically, when you look at things, you can retrieve information.  It's helping you in real world navigation so it's pulling data from the online world and mapping it to the offline world, correct?

Absolutely, and there are many different perspectives on this kind of data collection.  There are many uses of data.  For example, when we started three years ago, I was doing research in this general space, back in the early 1990's, along with Alex Pentland, where we were tracking people using computer vision and video cameras.  Of course, it's much harder to do than to track people using location and Wi-Fi positioning. 

In the past few years, our job became a lot easier and now we can start doing some of the things we dreamed about, ages ago, with the real location data that we're getting, which is super accurate, super precise.  If you compare that to pointing a camera outside of a window, onto a courtyard, to track people and say, "Where are they going; who is interacting with whom, who is influencing whom," we were trying to do these things with video sensors. 

When Greg came along, we realized this was the right time to move over to the mobile phone as a platform.  Greg Skibiski's initial idea, in starting the company, was to use this data for financial analytics.  Can we somehow use location data that's being gathered about us to help make predictions on retail sales?  Are people shopping?  Are they not shopping?  Are they drinking coffee at Starbucks, or are they not drinking coffee at Starbucks, but going across the street, to Dunkin Donuts? 

That was the initial idea behind the data.  Then, we realized it's such a rich data source that it's far beyond financial analytics.  The real opportunities are to bridge some of the amazing social tools, social networking tools, and also collaborative filtering, or recommendation tools online, into the mobile phone arena, and make them offline services instead of just online services.  That's what Sense Networks is now pushing towards and has been pushing towards, for the past two years, as its main mission.

Okay, I have to say something positive, right at the start here.  It's not a compliment I give very easily, but I have just taken a look while we're on this call - Sense Networks, and I'm looking at City Sense, at the moment.  It's perfect.  It's exactly lined up with where I saw value a number of years ago, and I'm amazed some people have got there quickly.  This is an area, which I had wanted badly two years ago, looking ahead.  I expected it to be another five years.  To see this, now, is very good.  I've just read through it and I think I'm known for not giving compliments easily, and this really is a perfect direction that you're taking, in terms of value and value going forwards to build upon.  I'm rather excited by that, and I hope you keep building in the direction in which you're going because I see a lot of value in this space.

Thanks, Lee; we're very excited about City Sense.  It's both a socially sexy application but it also has, down the road, very commercial opportunities, as well.  Basically, what we wanted to do was to make people city savvy with this tool and to give them a "sixth sense" if you will, of what's going on in the city around them. 

Let's say that I'm at home at night and I'm watching T.V., and it's 8:00 p.m.  I would like to know if I should go out.  I would like to know what's happening; where are people tonight?  Are they at a bar or is there some outdoor concert?  You can try to find that information online, but it's so much more immediate when you can see this real time sensor of where people are.  A lot of times, if you are waiting for the online summary, you've already missed the action because some things happen serendipitously. 

Another thing is, what we like about this is we're letting people vote with their feet.  When you look at City Sense, you're looking at data from, basically, hundreds of thousands of location pings, in real time.  This means if somebody has access or doesn't, but they're at this event; you'll see it.  You don't have to be a blogger.  You don't have to have your own website to post information about where you are.  It's basically this collective intelligence that anybody could contribute to by actually just showing up.

Another thing that's really great about it is it's honest.  You can't spoof the system the way you can say, "Come to this wonderful restaurant.  It got 5 stars out of 5".  If there is nobody there, there are no people generating GPS or location data.  You know that it's not a happening place.  So, there is this honesty to location data, which is much harder to spoof, which I really love about it.

One of the greatest examples was when we were in San Francisco.  We don't know our way around it very well, but we looked and realized that City Sense was recommending this bar location, which usually, is completely dead.  There are no events there.  But, it was maybe 78% above-average busyness for that time of day and that day of week.  We went there and it turned out to be a very obscure bar; no one ever goes to, but some famous band sent their fan list an email saying they were going to have a big concert there.  All the fans of this band came.  The only way to know about this concert was if you were already on the email list, as a fan of this rock band.  We showed up and people were very curious about how we found out about it.  We actually, oddly enough, told the band and the people at the bar.  We weren't fans on the list; we actually used our sensor to discover this event that would never otherwise appear on any other medium.

I think that's fantastic.  I really do.  It overlaps with efforts of mine, over the last three years.  Unfortunately, it's not my interview.  Maybe you can interview me, sometime [laughs]. 

It reminds me, though, of a conversation I had with British Telecom, I might add, a couple of years ago.  It was regarding their API.  They have an API where you can get the location of a subscriber.  You do a dip.  You get a single location.  It costs quite a bit of money.  The actual feedback I gave was, "Look, the value of this, especially at the cost, isn't so good.  You're missing the mark, here.  Where value would be is if you aggregate the masses of location information and make it available to the Web so you can do mashups and hot spots onto Google Maps".  Again, to see what's exciting, to see where people are going.  This is where value is and operators have that location information.  Make it available, not on a one-by-one basis.  Obscure it.  Make it thousands of people.  Then we can see people movement and we can build new applications just on that massive information.

I'm really surprised that you're already there. 

It was definitely a struggle.  You're absolutely right that a lot of the people who have this data aren't doing the things they could be doing with it.  There is definitely a lot of boot strapping we have to do, getting data from different sources to get this stuff going.  We're not a massive carrier that can just say, "Now, we're going to start using our own data that we've been storing for years and years."

Okay, unfortunately, with the way our interview happened, which was a call without pre-arranged questions or preparation, it's a real shame.  I would have liked to have looked over a number of things.  It's just the value space here is so incredibly high because operators, as you know, are in what we'll politely call "challenging times".  The thing is, the value that they have is really, what is officially called the "signaling running over their networks".  Who is calling whom at what time, who is in what location?  This is gold running over their networks.  Today, it's latent.  It's not being capitalized upon.  As you can see, you're beginning to look at such data and you're coming up with what I would call - you're on the road to very powerful applications from that data.

Do you see how operators have got this constant flow of location information and conversational information, as well?

Absolutely, and I think to the operators' credit, they have been using some of this information to start improving some of their churn modeling.  If you are an operator or a carrier, every morning you wake up and think about how you reduce churn and minimize churn.  They are missing these amazing opportunities that companies like Google are much more used to exploiting, which is how do we improve recommendations, search, collaborative filtering, marketing and advertising using this type of data. 

I think the carriers have the data but they don't think of all these opportunities, quite as aggressively let's say, as a company like Google, which is used to mining the data and basically closing the loop of potentially search with advertising or things like AdWords, which are a very tight loop of combining what you're interested in with what somebody would be interested to advertise to you.

You're absolutely right.  There is a massive amount of value in the data, in the aggregate sense, because you can measure what people are doing and get, first of all, statistically reliable information versus just asking one or two people.  You have millions of people who are giving you're the answer.  In addition, you don't have to aggregate everyone into one big category.  Some of the things we've been doing were actually clustering people into one of twenty categories.  You don't really reveal too much about the person, as an individual, but you can still say some very general things. 

For example, we can tell somebody where people like me go out, in San Francisco, as opposed to people like my parents.  We can automatically show you, not just the location of everybody as a generic person, but show you where the younger crowd or older crowd or the family crowd, or the business crowd or the tourist is.  All of this is something you can extract from the data without getting all the way to the level of personally identifiable information, or, the other extreme of complete aggregate statistics, but start getting what we call "tribe statistics". 

Tell me what my tribe is.  My tribe is the group of people that I hang out with.  Maybe it's some other tribe that I'm interested in.  Where do artsy people go at night, in San Francisco, or where are all the techy people, right now?  They might be at a conference or there might be a cafĂ© in San Francisco, which is known to have a lot of the tech people meet up there and hang out. 

I wouldn't know about this if I looked at the aggregate data and I certainly wouldn't be able to tease it apart if I looked at every person, one-by-one.  You need something in between.  You need something to go from total aggregation, to something closer, but not all the way into individual personalization where you say, "Here is where Tony is," because that might not be very meaningful for someone who doesn't know who Tony is, immediately, but would rather know what category Tony fits in.

Okay, I have to fully agree with you, and again, I'm surprised somebody is so on the mark, so early.  It was only a few years ago that I was dreaming in these directions.  Would you be comfortable describing the work you're doing as at the intersection of sociology and computer science?

Absolutely, in fact, some of the papers we're writing describe this combination of computation or computational science and social science.  We're looking at ways of using more sophisticated algorithms and statistical tools.  There is just so much data that you can't model this using simple ad hoc techniques.  You really have to put on your statistician's hat, sometimes, and say, "It turns out there are twelve categories here.  We're not going to just write a simple description.  We're going to let the algorithm discover that there are twelve categories, on its own". 

A lot of this stuff - we've taken a data-driven approach, a little like the Google philosophy of things.  We don't go out and hand label places and people in our data.  We actually let algorithms automatically discover them and say, "It looks like this place is similar to this place, and this person is similar to this person".  We do this algorithmically.  We let the data drive the models, instead of impose some kind of ontology or structure and say, "There are forty-eight categories of people with ten sub-categories, and so on," manually.

It might be very interesting to combine both.  Combine the ontology or tagging and have your data drive, as you say.  It might be interesting, at some point, to look at both, combining the two.

Sure, yes, we're always interested in getting better ontologies, but one thing we've been doing is also looking at the real data, which has been changing our perspective on social science.  People really didn't write categories in social science when it comes to mobile phone behavior, or even location behavior.  There are categories from the traditional social science surveys, where people fill out questionnaires or they do polling, but the categories we see in our data are very different.  They're basically based on movement and mobile phone behavior, which doesn't necessarily fit into the traditional ontologies.  There might be some ways to overlap the two, but in the sense, the data is driving new twists and turns in the ontology, right now. 

I just have to jump in, at this point, because I'm very inquisitive.  Are you aware of the work of Nathan Eagle?

Oh absolutely, Nathan is a very good friend of mine.  Actually, we were together at a conference, last month, where we were both presenting some of our research.  Yes, Nathan is doing some wonderful work, especially work in Nairobi and in Africa, where he has been using mobile phone data from people who rely on their mobile phones 24/7, as their only means of communication or interaction with the electronic world.  Things like financial transactions are made through your mobile phone because you have no other choice. 

It's a very rich data set because in a way, some of these underdeveloped countries are getting better data than we are because everything is happening to their mobile phones, in one device, versus the way we live.  We have different computers, a desktop at work, a laptop at home, a mobile phone, and they all have different cookies.  How do we glue that all together?  In a way, we have a piecemeal view of our data, whereas, some of these African countries have a more complete data set of what people are doing because it's all on that one platform.

I appreciate that viewpoint.  Nathan spoke at the debut in 2008, and his video is on the Web.  It's really interesting to look at what Nathan said, look at what Marc Smith said, and again, his video is on the Web for the debut 2008, and then to take a look at City Sense.  I have to say, again, this is really fantastic.  There is just massive value, the money, potentially, in these directions is absolutely incredible.  Actually, these applications and what you're doing there will grow bigger and will actually be, in my opinion, and it's a very strong opinion, transformative of society.

To give that some teeth, or at least not to sound as if I'm blue-skying here, if you look at Google AdWords, how has Google made its money?  They've made it in online advertising.  It's been contextual advertising, but the only input to that has been a search string.  It's very little to judge somebody's intention.  That is almost as if we're at the typewriter phase of advertising.  Google only took an incremental step forwards, and the scope is almost beyond belief.  With the work that you're doing, you can see what I would call an "AdWords 2.0".  Even that is not giving it the leap forwards that is possible.  You've got so many more input parameters, almost to the point where, at least in an idealist sense, advertising doesn't exist because it's so finely tuned that it's conversation.  Would you agree?

I agree.  In fact, what we've been looking at is not just asking somebody to type in a search query, but trying to figuring out what they're interested in next.  Imagine if I pulled up a phone and it already knew that I had just had dinner, so it wasn't going to recommend a restaurant to me.  It knew that it's a Saturday night and I usually like to go out after dinner because I was some place that's labeled a restaurant, for the past two hours, and it would recommend, in this new city, maybe three different bars or lounges that I would enjoy.  If it knows my history and my personality, it can recommend these places, in a new city for me, very accurately, and know that I'm interested in that, right now, because it's 9:30 p.m. and I just finished dinner and it's Saturday night. 

In recommending those three places that are "Tony's favorites", it can also give me three sponsored links that are not too different from my favorites, but maybe are other tourist attractions that want to advertise.  Now that it's targeted towards me and it's interweaved into my recommendation engine, I'm willing to entertain these kinds of sponsored links, and go to one of these sponsored bars or sponsored lounges or some other tourist attraction, or if there is a movie or a theater.  I know that my system learned about me and gives me pretty good recommendations.  Now, I'm willing to entertain these "sponsored" links, just like when we use Google.  We know that it's learned a lot about how we use these sentence search queries to find what we're interested in, so we're willing to entertain its advertising on the sidebar and say, "Let me take a look at one of these sponsored links, as well". 

Once you're able to provide smarter search and recommendation, then you can start using that for advertising, as well, and saying, "The smarts can now be used to also pay for some actual sponsored value or ad value".  That's exactly why we are looking to use the location data for and the location history of each user for, as a way of automatically figuring out what to recommend, from your history and the place you are, right now, and to tie it to your favorites as well as some sponsored, real places that we would make mobile advertising dollars off of.

Not only the location changes in one person, as in you are in a new city that you are not normally in, but you also have the data on what people do en masse, the aggregate data.  You know that people normally, at 5:30 p.m., get in a motor vehicle and people who go on "X" road will pass by this gas station.  You begin being able to predict peoples' lives by looking at the aggregate and being able to look at the individual.  Because you know what they're going to be doing next or likely to be doing, or even being able to deduce what their wish is, you can begin advertising to that, satisfying the wish or the desire. 


Do you think a day will ever come where you sit down and pizza comes to the door, but you didn't order pizza; it was just that time of day and you sat in the position where you normally would have dialed for pizza.  It almost becomes as if your thoughts, desires, and wishes - the ultimate is where they're getting fulfilled without you actioning them explicitly.  Do you think that's still sci-fi, going that far?

I think it's a little sci-fi, but it's not so sci-fi if you think about it.  It's already happening in the online world.  Before you even know about it, your wishes are being predicted when you log into Amazon and it recommends a book you should read, based on the previous books you've read.  All of a sudden, it's anticipated "Lee, you should be reading this book".

If you think about it, ten years ago, that would have been completely sci-fi.  Today, it's pretty much something we're used to.  We're doing the same types of things with restaurants.  Let's say you went to a fantastic Italian restaurant, I was there with you, and we both enjoyed it.  Actually, I wasn't there at the same time.  Let's say you went to an Italian restaurant, in New York, a few months ago.  I went there, last week.  You also went to another Italian restaurant and enjoyed it.  Since we both co-located in one place to begin with, maybe I should be recommended this new restaurant that you've also discovered on your own.  In a way, the smart discovery is almost like this intelligent pizza prediction.  We're using what somebody just like you wants, to make additional recommendations to you, and also your own personal history.

We can start figuring out, not just from your own data, but the aggregate, a pretty good prediction of what you're interested in, right now, and making these kind of serendipitous discoveries.  Pizza is kind of a temporal type of prediction of right now is pizza time, but there is the whole discovery type of prediction of not this pizza, but you would be interested in trying this brand new, brick oven pizza place that just opened down the street because it seems as if people like you also enjoy it.

Or, like first degree friends have been there.  That kind of gives relevancy if people are first degree or second degree or third degree, I would imagine.  Then the social graph is probably an important input parameter.

Absolutely, and at Sense Networks, we have built social graphs from location data.  We have almost 100,000 users in our social graph that we've been tracking for several years.  They're opt-in users and their data is kept very secure.  They actually own their own data, which means they can delete it.  But, we are basically building a Facebook-type application from the location data.  You have people you may not actually know, which are your first degree friends, because it turns out they're hanging out in very similar places as you are, maybe actually the same places even though you don't know that fellow's name.  He's always at the same restaurants, he goes to work in a similar place, and he always does similar types of things.  We're doing this data-driven social network. 

If somebody in your first ring of mobile friends or your mobile book ring of friends does something, chances are you'd also be interested in doing it, as well.  There is also the second layer and more and more degrees of separation as you work your way away from your most immediate friends in your network. 

I just get really excited because for me, the 1990's were about the world wide web.  The world wide web was linking pages together.  Okay, we got incrementally improved, after the year 2000, when we began seeing audio and video, especially in 2006 with YouTube etc. on the Internet.  So we were getting content put out there as well as pages hyperlinked together.  That had a lot of value, what the world wide web has done.

But that was primarily linking pages and embedding content.  What happens when you truly start linking people together, people and places, and using the data storage, data mining behind that and really trying to drive forth commerce, and prompt social interaction between people?  As you just said there, you might be in the same location as somebody or fairly often in the same location but you don't know each other.  Should it not be that your device then starts prompting social interaction because it deems that the other person may be a valuable new relation?

Absolutely and I think we're missing out on a lot of that information when we're only using online sources for it.  That person who just happens to be always in the same types of places you are, that's a person who is more like you than somebody in your Facebook network, potentially.  I've linked to family members and grandparents, on Facebook, but the reality is there are people in my network, but I have more in common with somebody who goes to the same places, works in the same environments, has coffee in the same places as me, than my grandmother, even though she is in my Facebook network and this person isn't. 

So there is a lot data being missed completely by some of these online networks versus the offline networks.  The offline data, in many ways, is richer and it doesn't involve self-reporting.  In a way everyone, when they report their personalities online, it's not the true you; it's the aspirational you.  Meanwhile, the mobile phone data is the true you.  Everybody has their online persona where they may put their flashiest pictures on Facebook and they have a very different side of themselves.  Their true you is what they're doing in the offline world.  That's much harder to fake, let's say.  We're able to collect both the aspirational you and the true you, and also serendipitously discover people like you, that you otherwise wouldn't know because of this co-location type of information.

Okay, generally speaking, you would say that the offline data - where you are, your movements in a day - it sort of straddles the border between offline and online.  The calls you make and SMS that you send, those call detail records, that location information - which VLR you're on throughout the day is owned by operators.  Operators, when it comes to their business model, it's telephony and SMS.  Both of these are pretty much saturated.  They're certainly not growth markets, over the next five years.  There are only so many minutes we can speak and call prices are heading down etc. 

I almost want to say it's unlimited value when you start taking that data, which is signaling technically, telecommunication signaling it's carried in; when you start marrying up that telecommunication signaling data with Web-style applications and so forth.  I'm wondering if you share that same hope that there can be this perfect marriage between telecom companies, because of the data they have flowing in their networks, and I believe the likes of AT&T actually record all network signaling and record terabytes of this network signaling, for many years now, but they're not doing anything with it.  Do you see such a perfect marriage between telcos and Internet companies where they can come together, let that data come in, let Internet-style companies capitalize or close loops, and create something which is fundamentally transformative, high value, and I almost want to straddle into saying a new economy?

I think a lot of people want to do this.  There are some technical and ownership values and issues like how to combine offline cookies with online cookies, and your telephony information with your online information.  Some of the issues also - it's hard to mine your actual communication when it's voice communication.  It's also, in the end, a very sparse network of people you talk to with your phone and whom you SMS with your phone versus the types of networks we see, for example, when we are building a network of how you move around similarly to how other people move around the city. 

There are issues of the "sparsity" of the call network, which is what a lot of the carriers call it; whom do you call in your calling plan on your cell phone.  That can be pretty sparse and it's harder to use that, for example, than to use a denser network like the Facebook network, which sometimes is denser.  You might have a thousand Facebook friends, but you don't phone them all up the same way.  There are some sparsity issues.

But I do think the Holy Grail is to combine all this data into a multi-dimensional perspective of the user, not just their online or offline personality but what things they buy, read, and how we can enrich their lives by making intelligent recommendations.  Just as a final thought, as an engineer, and a person who has to crunch through a lot of this data in practice, there are some hurdles because the data is always different.  The carriers are storing very different types of data than Facebook, Google, or any website.  Even with the cross-carriers, each carrier stores drastically different types of records.  There are different language barriers and so on. 

One of the reasons we focused on location data more than any other type of data source is because it's always the same.  It's always latitude, longitude, time stamp.  It's just four numbers, if you also include the user ID, as well.  There is a user ID and then there is latitude, longitude, time stamp.  That's true in New York, in England, in Tokyo, and Africa.  Everywhere anyone is on this planet; they're generating these three numbers with a user ID, or four numbers.  We ping that every so often.

What's great about that kind of data is every carrier is generating the same type of data set there.  It always means the same thing.  You don't have to convert the text interaction or the SMS from double byte to some other language.  When it comes time to integrate things across an international scale, it turns out that some types of data are much easier than others.  Location is one of the common bridges, across the board.

I really respect that must make life easier, and the more difficult side of things, you cannot help but dream of being able to link everybody's different ID's together for the ultimate aggregation to make the most powerful applications, i.e. being able to combine people's Amazon wish list and book purchases with their YouTube video subscriptions and YouTube videos that they uploaded and songs with their location data, analyzed against a backdrop of aggregated location data. 

Do you have any of these pipedreams of combining everything so it can all be crunched together?

I do like that idea, especially because so many things are hidden redundancies in the behavior.  If I know a lot about someone's musical tastes and book tastes, that should be a pretty good prediction about their movie tastes, for example.  In many ways, each one of these websites is really storing a network of their users.  It's their own little Facebook network of who is like whom.  Who is like whom, in terms of their book preferences?  Who is like whom, in terms of their movie preferences?  Imagine that each person is a dot or a node and we draw links between them if they have similar purchasing or browsing behavior.

I think the dream, at some point, is to have this network with different layers of connectivity where you have overlap with somebody based on your book preferences, overlap with someone based on your movie preferences, your social preferences, etc.  If you can aggregate that into some giant network, I think there are a lot of very impressive things you can do.  In the end, we're actually much more similar than we think we are and we have a collective intelligence.  The smart things you do in your day can help me in my day and vice versa.  If you find an interesting book, an interesting place to eat, read something interesting online, I should be told the same thing.  There is a massive amount of information but the only way to organize it is to leverage the intelligence of your fellow users, either mobile users or online users.

I have to agree with you and we're swamped in more and more communications.  We're swamped in more and more information, media, and content.  We only have so much time and with the new scarcity being time and attention, then the more data you have, as you say, the better you can filter.  The better you can recommend.  What you're doing is cutting through this massive rise of information content and communications and only pointing out what is relevant.  You can use this data to filter calls, to filter communications generally, to filter media.  This is why, at the start, I said this is the road to what I feel is unlimited potential.

I guess I asked you about combining all the data sets.  I just wanted to know if you were a dreamer, as well.  I'm glad to hear that you are but you've been more practical just looking at the location because every operator produces it in the same format.

You talk about this sea of connectivity, which can be many layers and viewed from many dimensions.  The whole value is in the sea of connectivity and the worldwide web is really just a beginning of that.  People get excited about the Web and the phone, but that's just pages on a phone [the mobile Web].  That's not that exciting, really. 

When you start taking the data that phones are producing, aggregating it, and combining it with the online world, the Web, and vice versa - now that I've spoken to you a little bit, I think you're well aware of where the value lies.  I kind of smile because often, people are depressed about the state of telecoms and so on.  It need not be the case.  Again, I just feel it's an unlimited opportunity.

If I jump from unlimited opportunity and mention another topic in this area, because it's an area I've been thinking about for years, it's the Big Brother aspect.  Although the applications become terribly exciting, the worldwide value and it drives efficiency, the economy, and so on, there is this Big Brother issue.  It actually makes big brother seem pale.  Have you thought about the whole privacy thing when you can predict who will like whom, what time somebody will leave work, what it is they'll order, and you begin to even know what time they clean their teeth every night.  Have you thought about the whole privacy issue challenge in that basically, privacy is gone?  We just accept it.

You're absolutely right and privacy is a very challenging aspect of a lot of this.  We do think about it all the time.  At Sense Networks, we've taken some important lead steps in defining new data ownership plans.  Alex Pentland, our Chief Privacy Officer, has drafted a plan where the basic idea is that you own your own data.  We think that's the future model of how a lot of this valuable data, be it for consumer or commercial or for government types of scenarios is being collected, at the end of the day, I think the consumer or individual should own the data.  When you download City Sense, for example, you have ownership of your own data.

If, at some point, you decide, "Hey, I don't want to use the service anymore.  Delete everything you stored," we'll delete the data.  Or you can say, "Delete my last 24 hours".  At the end, the data is owned by the individual.  If it's deleted by the individual, then it's not our responsibility.  It can't be used for a subpoena because the government wants to figure out where Tony was on November 24th.  The data is deleted and we don't own it.  I think that is an important step, to provide data ownership.

The other issue is that if there is enough value, you don't mind the data being used.  Let's say you don't want to reveal to somebody that you're a diabetic because it's a violation of your privacy.  If you're being rushed to the hospital because you were just hit by a car, and you're about to be injected with something that might interact dangerously with your diabetes, you would want your doctor to know, at that point. 

You don't want to give up your information, your privacy, for nothing.  I think the worst-case scenario of that is when it's big brother.  You're giving it up for nothing.  But, if there is some real value to the exchange of private information, then it becomes a reasonable transaction.

If I'm at a party and I meet someone, and they ask me where I live, in that setting there is a social interaction and there is some value being exchanged.  I'll reveal where I live.  If somebody on the street just stops me and says, "Tony, please tell me your address".  Not Tony, because they wouldn't know my name.  "Please tell me your address."  I wouldn't reveal it because there isn't any value transaction.

The more value we can provide for the data, the less people feel bothered by giving up some privacy.  There is a dollar value to privacy, and if you don't give the value back, in terms of a social recommendation, or some smarts, then you don't get to buy the privacy.  That's our philosophy about it.

Okay, I had come to the conclusion that the more you give up privacy, the more free things you get back.  You see that, today, with how people use the Web, and Facebook.  I just saw the future as where you're almost forced to give away ever more and be ever more transparent about every micro-aspect of your life in order to gain some leverage back, as in connecting with new people, getting things cheaper, getting more relevant information.  Do you see some external - do you see some pressure coming where you really don't have an option but to make your life fully transparent, otherwise, you get harmed in creating new relations, having relevant information, better product offers, and so on?

There are two sides to that coin.  I think you do need to reveal to get more back because, especially if there is something like a targeted ad, you're never going to get that ad discount or coupon if you don't reveal that you're one person who is likely to use that promotion, for example.  It's not valuable for a company to send promotional materials to someone who will never use them.  You might miss out because they don't know that you've stated in the category of somebody, for example, who would go to a theater or an opera.  Why would they send you a flyer if you would never go?

There is also some good news from the algorithms and computer science side, which is showing that a lot of the calculations that you need to figure out if two people are similar or if they're in the same tribe or cluster, they can actually be done in a way that's privacy preserving. 

Some of our research, and this is getting fairly technical, has proven that if you and I have a list of movies we liked and didn't like, we don't have to reveal that list of movies to each other to figure out that we're similar to each other.  There is a way to send the information back and forth, in a very privacy-preserving way, so that if we don't have enough compatibility, we'll be told "Lee and Tony are below 10% compatibility," without either one of us actually exchanging our profile information. 

There are some ways of doing this, now, and my future vision is that you have the data about you stored in a safe repository.  Then, if you want to figure out who is like you or there is some other information you want to extract from the data, you don't have to reveal it all.  You can just reveal it in a piecemeal fashion so you can still get the job done.  You can still make good recommendations.  You can still find tribes of clusters without broadcasting the data totally publically.

This has been an actual breakthrough in computer science, which is basically privacy-preserving computation.  How do we figure out that we're similar or that we're linked in a network, without revealing everything about ourselves?  I think that is some good news for the consumer and the public, at large.

I agree with you and it reminds me of some work that IBM in Zurich had done.  Let me discuss that with you, offline.  When you had said that people own their data, that should be the model you do worry about people caching, and so on.  It reminds me of the Seven Laws of Identity from Kim Cameron.  I don't know if you've heard of him.

I've heard of him, but I don't know the seven laws.

Okay, he came up with seven laws of identity, but like rules of digital identity.  One of them is that people should have control of their own data.  Where are you getting this location data?  I have to ask because before you, before this surprise call, I was interviewing Russ McGuire, from Sprint.  Again, I was saying, "Why are operators not making location information available so we can merge it with the Web?"  I'm wondering how you're getting the location information.  Is it from the network, the handset, or both?  Where are you getting it?

We have subscribers that have either used City Sense and are providing the data in exchange for using City Sense.  There are a fair number of people there.  Also, there are people using our buddy finder application, where they show their location only to friends, but we're the trusted third party, which basically gets the data and shows the location to the friend.  In exchange for running this service, we collect the data but we don't sell it or reveal it to anybody.  That's where we get our users.  We also get a lot of taxi information. 

It turns out that a lot of the cities in North America, and elsewhere, as well, have been outfitting vehicles with high accuracy GPS units, for taxis and other public vehicles.  What's great about that data set is it gives us high density.  For example, in New York, we have 18,000 taxis.  Every few minutes, we get their location.  Every time they pick up someone or drop them off, we get their location.  What's great about that data is you don't have to be text savvy to generate location information.  It's a great way to de-bias the data so that it's a view of everybody versus just a view of people who have smart phones and iPhones or BlackBerry's with GPS.

In addition to our users, we have a large amount of data from vehicles and taxis.  I think you always need a bootstrap.  The taxis were an excellent bootstrap because no one wants to look at City Sense and say, "Here are twenty people, and here is me among twenty people".  What was great about the taxis is we were able to instantly give high-density information, in addition to our 100,000 users, using the taxis.  Right away, there was some immediate value for somebody, even if they were the first user of City Sense, when no one else was on there, there was still something to see because of the taxis.

I think that is true about a lot of these viral phenomena like Facebook or the Internet.  There is no value until enough people start using it.  The key trick is to do that bootstrap of getting the first 10,000 users through universities, as Facebook did, or to get the taxis in there to get things boot strapped.

I think once things are boot strapped, then it's more interesting for people to use the service and then it makes a company like Sprint think twice and say, "Wait a minute; now that this stuff is taking off, maybe we don't mind contributing our data, at this point". 

Our hope was, in starting this bootstrap, that it would release the floodgates of the carriers, which would now feel better about taking that step forward.  We're a small company.  We have less to lose by doing something like this than someone like Sprint.  Even if they lost 1% of their customers, that would be a huge nightmare for them.  We could take something risky, like this, and try it first, in the hopes that it would create this bootstrap that would start the data sharing philosophy across all the carriers.

Okay, so I'm just wondering here.  Are you guys thinking, already have thought of, or already offer data mining for operators?  I mean, looking through millions of subscribers' records, looking through the location history.  We're talking about massive data sets.  Are you guys offering that or thinking of offering it, at all?  Again, it is a big future ahead, but that's instant money.  Operators really need a firm prepared to start going through that signaling information and [0:54:11.4 unclear] out to developers, analyzing it, and helping them build towards a future with it.

Absolutely, that's a core to our business model.  What you've just described, you should be one of our chief executives because that is where we're going next.  We've been working with some carriers and some other people in the mobile arena who have this location information.  We've built an API that lets you process the information and do all sorts of analytics, much like Google Analytics analyzes Web data and Web flow information; we analyze the flow through places.  We build profiles of people that let us build social networks from the location data.  It's all integrated into an API that we've built, which is a series of software tools that let you process amounts of location data from mobile phone users in a very easy to use API, with either Web interfaces.  It uses cloud computing in a lot of the machine learning tools.

The hope is; a lot of these carriers - once these tools are available to them, they don't have to invest two, three, or four years of research time with their own research labs.  They can directly use this API and start analyzing this data.  They're already very savvy with their CRM models, churn models and network models.  Location data has been sitting idle.  I think one of the main reasons is because it's harder to work with.  With our API's we've kind of reduced that hurdle. 

If you do think about it, location data looks like spaghetti.  For every user, it's basically spaghetti of dots and trails of where they've moved around.  It's not very easy to plug into a database, the same way it is to plug into someone's zip code or age, which is a much easier database item to work with than a spaghetti trail of location data.  We've built an API that helps convert this location data into nice database fields and also builds networks and graphs out of it, making it very easy to mine the data automatically, for people who haven't.

Okay, so in my opinion - and I don't know your company except when I looked at the Web, after you called.  You have an instant way of making money, today, because I know there are a lot of operators willing to pay a lot of money for the service you just described.  I see it going long term in the future.  I kind of wish I were in your shoes.  I have to be honest; I think you're in a fantastic position.  I'm very enthusiastic about it.  Also, you're talking about location.  At least you're being concrete as in focusing on location. 

It's so exciting because you can just keep building upon that by plugging in users' Facebook social graphs.  Facebook potentially could send them a text message and they have to enter the code on Facebook.  You know it proves that their Facebook ID is linked to their mobile phone.  Then you could read your social network and plug that data until the online activity gets more married-up.  It's just almost an infinite path of development there, and it gets more powerful.  That's really exciting, but I know we've been on the call for such a long time.  I'm getting a bit conscious that I'm keeping you too long.  I would love to jump to a couple of other questions. 

For me, let's say you're in a tenement block, housing block - whatever sort of cultural terminology you want to use.  Often, these buildings have hundreds of people in them or maybe fifty at the small end.  You don't know what everybody's doing.  Is it not more natural to begin seeing the profiles of the people in your building?  You talk about City Sense, and to me, you're almost talking about a sixth social sense.  It would be nice to see the profiles of the people in your building.  Or, if you're waiting at an airport and you have time to kill, why are we not sensing what other people are interested in, what their key skills are, where they're from?  These latent connections and relationships between people may be relevant to us so maybe we should be making them explicit.  What do you think?

I agree.  In fact, your example is a very good one.  The way our algorithms work, you would actually get some linking to the people that lived in your city block or tenement, because of the co-location within a reasonable neighborhood or distance.  They might not be in your immediate circle of friends, but since they live on the same city block, they're probably going to be closer to you on your social network than any random user would be, for example, elsewhere. 

There is similarity because people co-locate because they live in the same neighborhood.  It actually turns out, and if you ask marketing people, sometimes there is a similarity by what you do, which is much more important than the similarity because of where you live.  For example, two women living in Dallas, Texas, where one woman shops at Prada and the other woman shops at Wal-Mart, have less in common with each other than that woman in Dallas, who shops at Prada, has in common with a woman who shops at Prada, in Tokyo.

In terms of a marketing standpoint, there are more commonalities by where you shop than where you live.  Of course, we don't buy that 100%.  We do look at where you live, but we also combine that in aggregate with other types of activity behavior.  You're definitely right; you want to look at your neighborhood, but also at the people who do things like you, that go to similar restaurants and have similar interests and so on.  It's a giant network.  We are six billion people on the planet so there are many ways to find commonality.  We believe there is a nice balance of all these different flavors of things. 

If you are at an airport or you are somewhere, it would be nice if a system could take the initiative and figure out "The person right next to you is your neighbor," and you could start talking to them about ...

Or, went to a conference you once went to.  There are all these potential latent connections.  When it comes to telephony, man's biggest machine - the telephone network, it's got a lot of value because it has so many potential connections.  Some connections are a huge value, like dialing emergency services.  Although you hope never to use that connection, it's the potential of that connectivity.

The thing is; the telephone network is just connecting devices together.  There might be a billion fixed-line subscribers but you don't know about them.  You only know about one hundred fifty people, say.  You've lost all that value that would be exponentially higher if you just knew what was behind the telephone, i.e. about the person behind each of these devices.  Do you see how there could be so much more exponential value in communications if you just knew more about the thing that uses the device?

Yes, I absolutely agree.  There are so many people out there, that it becomes more and more important to know something about those people.  You've got to pick who you want to talk to, out of six billion people.  It was okay when we all lived in villages and there were about three or four hundred people in the village.  You could quickly figure out who is about what and get a good model of everybody within your reach. 

The Internet and communication networks have been great because they've opened up our reach so that it is worldwide.  What we have lost, while we've grown the network, is the ability to immediately know whom we want to communicate with in the network.  It's great that we've gone from the ability to reach three hundred people to the ability to reach six billion. 

Now, we've lost the ability to figure out whom we want to reach, out of those six billion, because it's not something we can do by peeking outside our house, looking at everyone walking around the village, and say, "I want to talk to this person who looks like the blacksmith.  He looks like a young guy my age and I saw him a few times at the same pub where we all had beer.  I know we're going to get along so I'm going to start talking to him".  The idea of now being able to prioritize the people in those six-billion person networks so that you know how many degrees of similarity there are between you and them; that's something we've lost when we opened up access to six billion people.  Being able to build these networks, based on what you're doing, your mobile activity or location, helps you understand that network and where you fit in that web of six billion individuals.

I certainly agree that it's a big input parameter to work that out.  So, as you said, we've built electrical connectivity between people.  That was a fantastic engineering feat of the twentieth century, to achieve bit flow around the world, to achieve electrical paths, circuits between people so you could pick up the telephonic receiver and place a call from one end of the planet to the other end of the planet. 

Now, we take that for granted, that physical connectivity between devices.  The new value going forwards, which is many factors more value, is in what I call sociological connectivity, relevant people knowing what's behind the handset.  I think you're going to find the space you're heading into is going to produce many, what I'll call, "Google's".  I wish you a lot of luck.  I'll not ask you another question, tonight.

Thank you for the good luck wishes.  It's true; it's a very exciting arena.  I think there is a lot to learn and a lot of real, human discovery around the corner, where we're going to understand better things about people, what we're doing, and how we can help them.  I think that keeps it all very exciting.  It's not just about dollars, but it's also about the social sciences, as well.

I think it's transformative, all the way to the fundamentals of society, how we interact with people daily, how we shop, commerce, how we're governed.  The impact is right to the fundamentals of society.  That's why it's so exciting.  You're not just seeing incredible monetary opp

eComm 2009 Conference

Get Updates

  • Subscribe to feed Subscribe to this blog's feed

About this Entry

This page contains a single entry by Lee S Dryburgh published on February 12, 2009 4:49 AM.

Russ McGuire Provides a View From Sprint was the previous entry in this blog.

Find recent content on the main index or look in the archives to find all content.