CheckInside: A Fine-grained Indoor Location-based Social Network

going to share with you our welcome to
CheckInside system, where we built fine-grained, indoor,
location-based social network. As we know, current
location-based social networks like Foursquare, Google+ with
location sharing and Facebook Places are designed
to work anywhere, whether it’s indoor or outdoors. And in order to estimate
the user location, usually you use one
of these methods. So they use the GPS. However, GPS
doesn’t work indoors due to the line of
sights conditions, or they depend on cellular-based
localization, which gives accuracy in the
order of kilometers. And the best accuracy
you can achieve is through Wi-Fi, which, based
on our experiments, as I will show, in the Foursquare
case the median error is 84 meters indoors. Clearly, this accuracy
is not accurate enough for indoor applications. However, as we know, as humans
we spend the most of our time indoors. Some of the studies show
that 89% of our time is spent in an indoor
location, whether it’s inside a campus, shopping
malls, or airports. So in order to
quantify or evaluate the current location-based
social networks performance in
indoor applications, we performed a study
where we collected data from more than 700 stores in
four different malls using 20 participants over
a period of six weeks. And our main location-based
social network was Foursquare. And our goal from this study
is to quantify the performance in terms of coverage
and the quality. And by coverage we mean, what
is the percentage of locations covered indoors by Foursquare? So let’s see some of
the results we got. First, we found that
Foursquare misses about 59% of the venues inside the four
malls we collected data from. From these, about 35%
don’t exist at all in the database of Foursquare. And from that, 4% was listed
as a different granularity. And by this I mean that a
restaurant, for example, with a specific name is
not listed by its name but listed as a restaurant. So it’s coarsely-grained
information about the location. Another finding, interesting
finding, that we got is that there’s redundancy
in the database. So we found out about
8% of the locations are repeated– same locations
repeated with a different name. And we believe the
reason for that is that the check-in list
returns it to the user to select where she’s
currently standing is not accurate enough. So the user has to
scroll the list. And instead of
scrolling, they just give up and enter the
same location again with a slightly different name. And this causes 8% redundancy. So this is in terms of coverage. In terms of quality–
and by quality we mean, how would we rank
the check-in list and how far the current user location
from the location. This figure shows
the CDF of the rank of the venue in a
return it can list. The recurve is the CDF. And we can see that in
Foursquare, 47% of the time the actual location, the place,
the actual place the user is standing at is ranked
at more than 30. And this explains why we
have duplicates– at least partially– why
we have duplicates in the check-in list. In addition, we found that
60% of the locations reported in the check-in
list by Foursquare are outdoor locations. Even though I’m standing
inside the mall, this is when Wi-Fi is turned on. When Wi-Fi is turned
off, the accuracy is even worse in the location. So this 6% increases to
24% of outdoor locations returned when I am indoors. In terms of distance–
so this CDF compares the difference in distance
between the location I’m currently standing at and
the location of the top item on the list. We found that the median error
in the case of Foursquare is 84 meters. So in summary, what we
get out of this study is that we need a more
accurate ranking algorithm for the check-in list for
location-based social networks when social networks
are working indoors. This will give a
better user experience so that the user doesn’t
have to scroll in the list, and it will also reduce
the duplicate entries in the database. In addition, since the coverage
shows that Foursquare misses about 59% of the
indoor locations, it would be good if
we can automatically increase the coverage of
location-based social networks through some
inference techniques. And this will also reduce
the granularity mismatch. If I can detect the
place automatically, I can reduce the
granularity mismatch. So our solution to this problem
is our CheckinInside system. So what we do is that we try
to identify the location using another idea other
than the location. And the basic idea is that
you leverage the cell phones as you get your sensors. We use the sensors of the
collected data from the sensors available on the cell phone
in a crowd-sourcing approach. You will have a fingerprint
for the location. And the basic idea is
that different locations have unique signatures
on the sound and images and initial
sensors, as we’ll show in the details
of the presentation. We combine this with information
extracted from social networks to obtain a semantic
fingerprint for that location. And hopefully by this semantic
fingerprint of the location we can get a better
or more accurate ranking of the venues
in the check-in list. In addition, since the user
chooses where she is currently standing, we use this
implicit feedback to enhance the
performance of our system. And as a side product we
can select automatically detect and label the different
venues with a floorplan without getting any explicit
information from the user. As nice as it may sound,
there are a number of challenges that
need to be met in order to realize this system. First, as we know, indoor
localization is inaccurate. The accuracy is in the
order of a few meters, which can place you on a
different venue than where you are currently standing. And this will reduce the
accuracy of the system, so we need to handle
the inaccuracy of the indoor localization
technique that we are using. Another challenge is that
current location-based social networks give incentives
for users to fake their check-in location. So, for example,
in Foursquare, you have this major shape paradigm. In other social networks, you
can give coupons to the user based on where
she’s checking in. So this gives
incentives for the users to check in at
different locations or give an erroneous check-in. So our system needs to handle
this erroneous check-in as part of its operation. In the rest of my
talk, I’ll give the details of the
CheckInside system operations and then show you some
performance evaluation results, and finally conclude and give
directions for future work. To CheckInside, it consists
of two main components– The CheckInside client
that runs on the user phone and collects data and ships
them to the CheckInside server that contains the core
component of the system and performs the different
system functionalities. So let’s start by the
CheckInside client. Whenever we are collecting
data from the user, of course, privacy is a main concern. So we have this
privacy control module where the user can determine
the mode of operation. Of course, there is a tradeoff
between system performance and user privacy. So what we do is that we
allow the user to choose different modes of operation. In the full operation mode,
we collect all sensors if the user wants to do that. However, we have a privacy
sensitive mode of operation where the user can elect to
turn off the mic or the cameras to obtain better privacy. I would like to note here
that some studies show that even though people
talk about privacy, 78% of the time,
the camera and mic are turned on on the
phones of the users. You can find the references
in our Ubicomp paper on our website. In addition, you can also
perform local processing on the phone or on the
nearby user laptopps where we extract the features
from the sensors and send the features other
than the actual images or video so that you can further
enhance the users’ lives. So once the user selects
her privacy mode, we start collecting difference
sensors from the phone. And mainly we collect most
of the available information in their shared
sensors, which we use for localization and for
mobility mode detection– camera, microphone, and so on. Again, there is also an
energy consumption concern. We don’t want to leave the phone
sensors running all the time, or we’ll drain the
battery quickly. So what we do is that
we have this module, the fixed venue
determination module. And the goal of this
module is to tell us that the user is
in the same venue, or it has moved
to another venue, and during the transition period
the sensors are turned off. And the way we do this, we
combine the initial sensors with Wi-Fi simulated threshold. And if you know that
the user is currently standing in the same venue,
we start collecting the sensor information to prepare
them for sending it to the CheckInside server. So the user now is
in a specific venue, and she wants to check-in. She clicks that she is
requesting the check-in list. So what we do is that we ship
the same sort of information through the communication
manager to the CheckInside server. The first thing we do is that we
construct a venue fingerprint. And this is a fingerprint
based on the different sensors available collected
from the user to try to identify where
the user is standing, not based on the
location but based on the semantic fingerprint
of the sensor information. Once we do with this,
in the yellow module, the blue modules try to
compare this fingerprint with the fingerprint of the
venues stored in our database. And this information
is collected through the data we
get from the user as well as from
the information we get from the current
location-based social networks. Once we do this, obtain this
more accurate check-in list through our technique, we
send to back it to the user. And the user,
hopefully, will find it in the top-ranked locations. As we will find in the
evaluation results, it does a check-in
operation by selecting which venue she is currently
wanting to check-in to. This information is
sent back to the system. And through the user
feedback module, this is considered
as a [INAUDIBLE]. We can enhance the internal
operation of our system as well as we can also
get the semantic floorplan automatically. So I have a floorplan. The user is telling me I am,
for example, in Starbucks. I can put the label of
Starbucks on the map. However, of course,
there are some challenges that I will talk about when
I talk about the details. So what I will do
now is talk about how we get the different
fingerprints from the different
sensors and how we do the ranking and operation,
ranking, and aggregation, and, finally, talk about the
semantic problem labeling. Let’s start. So we get our fingerprint
from these different kinds of sensors. We have location fingerprint,
Wi-Fi, image, sound, color, lights, and mobility data. Let’s take each one
of them one by one. So in our location
determination technique, we use our UnLoc system. I believe I presented details
about the UnLoc system in my Google Talk last year. So if you want more information,
you can find it in our papers on the Google Tech Talk. So the basic idea is that UnLoc
uses a dead reckoning approach. Using the initial
sensors, it dead reckons a user
location to obtain where the user is standing. However, as we know,
the dead reckoning error increases quickly with time. So to reset this
error, we leverage what we call anchor points. These anchors can be physical,
like elevators, stairs, and so on. So if I know that
the user is currently in the elevator using
the phone sensors, I can reset the user location
to the location of the elevator. Or they can be virtual anchors. And these virtual
anchors are unique points in the environments that
don’t have physical meaning. Again, if I meet one of
these virtual anchors, learn it through unsupervised
learning techniques, I can correct my
dead reckoning error and UnLoc gives us an error
in the terms of a few meters. So this is our
location determination for indoor technology
based on our UnLoc system. For Wi-Fi, one challenge
that we need to handle is the heterogeneity
of the forms. Different forms measure the
signal strengths differently. So instead of
depending on our RSI, we depend on the
fraction of time each MAC address
or access point is heard within a fixed
window in time. And this is our
Wi-Fi fingerprint. And this feature is more
robust through heterogeneity of devices. The image, the idea here is that
different shops or venues will, hopefully, have different
images taken by the user, and this can be used
to differentiate between different venues. And we use standard
computer vision features like SIFT features. However, these features are
computationally intensive and require a lot of memory. So what we use is we cluster
these features and obtain, this as a classical
computer vision autoterm, which is Visterm. It’s a visual term. And we store these visual
terms as representing the different
images we collected either from Foursquare
or from the user through the check-in operation. Similarly for the sound, we
collect sound information. And our feature for the sound
is a histogram of the amplitude. And the intuition here
is that such ops– like if you’re in a
library, for example, the fingerprint, the
sound fingerprint, would be different from if you
are in a music store, at least in terms of background
noise and the sound level in the different venues. Again, for the
color of light, you can use it for differentiation. Think about it. Starbucks, for example,
has a green theme. So based on this, we
can try to extract the dominant theme
of different venues. And to do that, we convert
the images from RGB domain to the HSL domain, which is
more robust in extracting the dominant light. And then we’ll perform
clustering over to the different
colors, and our feature is the [INAUDIBLE] of
the different locations and the cluster size. Finally, we use
mobility data also as a feature that differentiates
the different venues. And we have three different
mobility features– user activity, which means, is
the user active or not? For example, if I’m
in a clothing store, I am browsing more
frequently than if I was staying in a restaurant. So what we use is that we
divide the user mobility time by the stationary time,
and based on the threshold, we determine whether the
user is stationary, browsing, or walking. Similarly, the
difference in time can differentiate
different shops. For example, you visit
restaurants most probably in the morning, or during lunch
time, or during dinner time. This is different from
visiting other kinds of shops. And, finally, stay duration. You stay in a
restaurant more time than you stay in a
coffee shop, for example. So based on these features,
we extract some information and we use this to quantify
the different venues. Once this is done,
the yellow fingerprint is the fingerprint of where
the user is currently standing. So what we do in the next
module, the venue ranker, is we compare this fingerprint
to the fingerprints of the different venues
registered in our database. And we do this
through three steps. First is filtering. By this, we just
filter out venues that are far away from
the current user location. It’s based on location Wi-Fi. Then ranking for
each kind of sensor, we generate a ranked list
of the candidate venues the user may be in. And, finally, the rank
aggregation module combines these different
ranks in one fixed rank. I’ve done this much earlier. So let’s start quickly taking
each one of them one by one. So filtering returns a
fixed number of venues where the user
can be located at. By default, we take 10 as the
number of venues we return. And we do filtering
by location Wi-Fi. By location, it’s simply
based on the distance. So we compare the
estimated location of the user with the center of
mass of the venues on the map. And we use the walking
distance rather than the Euclidean distance
for a better estimate of the distance between
the user and the venue. For Wi-Fi, we use a standard
Wi-Fi selected threshold, and we rank the
different venues based on Wi-Fi similarity with the
current Wi-Fi fingerprint of the user, and we just
choose the top 10 locations to return as the
unfiltered location. For ranking, each type
of sensor provides us a ranked list of the
candidate user venues. So, for example, using
the sound– as we said, our sound feature is the
histogram of the amplitude. So what we do is that we
get the Euclidean distance between the histogram of
the current user location. And the histogram is
stored for the venue. And we take this as
the similarity measure between the venue and the
user current location. For the image features,
we said we have Visterms. Visterms can be treated as
terms inside a document, where the document is the
image and the Visterm is the visual term
inside this image. And if you do this, you can
use standard inverse document frequency technique to get
the most probable venues the user can be standing at. Similarly for the popularity
ranking and the other kinds of sensors, I will not
go into the details. You can look them up in the
paper for the sake of time. Once the ranking is done–
so for each type of sensors we have a ranked list
of candidate locations. We need to aggregate
them to obtain the final, single
ranked list that will be returned to the user. Of course, we can either
base the aggregation on the scores of the different
lists, or based on the order within the list,
ignoring the scores. And if through our
system we found that the order is
more robust, and we believe that this is
because of the high variance between the different ranks,
rank in terms of score. So what we do is that
we use just the order, and we send the order across
the different rankers. And based on this sum, we
determine the final order of the list to be
returned to the user. So once this single
ranked list is obtained, we send it back to the user. And, hopefully,
this is accurate. The location, or the venue,
the user is currently located will be at the hub locations. The user selects which
venues she wants to check-in, and this is sent back to
the user feedback module and to the semantic
global estimation module. So the user feedback module
takes this implicit feedback of the user of where she
is currently standing. And rank is the
different rankers. So the rankers that
rank this ground higher will get a higher weight. So by this we enhance the
system operation over time by giving different rankers
different weights based on the implicit feedback
we get from the user. Finally and interestingly,
what we can do also is to associate the labels. So currently the user is
saying, I’m at Starbucks. And I have an estimate of
the user indoor location. I can associate
this Starbucks label to where Starbucks is
located on the map. Of course, there are issues,
as we mentioned before, of inaccurate
indoor localization. There are inherent errors
in indoor localization. And also there are
fake check-ins. The user can perform
an erroneous check-in to gain some benefits. So what we do for this is we use
unsupervised outlier detection technique. And basically, what we’re doing? Hierarchical clustering. So for all check-ins within
Starbucks, what we do is we cluster these check-ins
using hierarchical clustering techniques. And one of the main
things we need to do is to determine at
what level do we need to cut our hierarchical
clustering to determine the clusters. And we do this using
Bayesian decision estimation. So after this operation,
we have different clusters and we need to choose
which cluster represents the correct check-in
within the venue. One can say that we
can do majority voting. However, when we are
bootstrapping our system, we don’t have enough check-ins
to determine which cluster based on majority votes. So instead we use
another heuristic, which is the correct
cluster is the one that is most similar
to nearby venues. So usually venues are
clustered similarly. And this gives us
the main cluster, which is the cluster
that minimizes the difference between
itself and nearby clusters. So after this step, we have a
cluster of check-in locations that is the most
probable cluster. We need to obtain the
actual venue on the list. And this is estimated
as the venue that encloses the center
of mass of check-ins within this particular cluster. More details can be
found in the paper about the mathematical
formulation. So this concludes the
details of the system. So how well did we perform? We used the same
data we collected from our different malls. It’s about 700 venues in
four malls, two cities, over a six week duration
using 20 participants. And this table shows the data we
found for these particular four malls in Foursquare. We can see that out
of the 711 venues, we found only about 436 venues
in the Foursquare database. And the rest are uncovered. And of course, based
on the type of venue, you have different coverage. So the most popular
is food places. So it’s more covered than
other kinds of venues. So what we’ll show, we’ll start
by showing the performance of the different
system modules, then the performance of different
modes of operation. And finally, we compare the
overall system performance with Foursquare. Let’s start with filtering. As we said, we filter based on
Wi-Fi and location information. And we can say that CheckInside
can locate the actual user venue 100% of the time
within a list of 15. So if your output list
is 15, I can correctly identify the location
of the actual user venue 100% of the time. This is compared to
only 29% for Foursquare, given the same list. So this is significant
enhancement just based on filtering
without any ranking. So if I include ranking,
and I’d add this, this figure shows
the performance of the different
rankers in terms of ranking the actual venue. You can see from this figure
that Wi-Fi is the best ranker. It can give the best performance
in ranking in the venue, followed by location. And the least performing,
in terms of ranking, is the mobility data. And we believe the
reason for this is that, usually, more
similar venues are located in the same area. For example, if you think
about the food court, all food locations are
inside the same location. So if you use mobility data,
they are close together. The best one will perform
similarly in all of them, and that leads to reducing the
effect of the mobility ranker. However, based on our
CheckInside system, we can rank the actual
venue the user is standing at in the top five
99% of the time. This is, again, compared
as we showed to Foursquare, that 47% of the time
the actual user venue is located more than 30. Again, this is
significant enhancement. For the user feedback module,
the figure on the left shows that starting
from equal weights to the different
rankers, over time the user feedback
module can stabilize to the actual weight of
the different modules. Using just as low
as 30 check-ins, I can reach the
actual, or the correct, weight for the
different modules. If I use this feedback
in ranking the modules, the rank of the actual
venue at the top of the list enhances from 70% to 83%. So CheckInside can rank the
actual venue of the user at the top location of the list
as number one 83% of the time. These are very
promising results. Finally– please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF:
Yes, we collected in a crowd-sourcing approach. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: In a
crowd-sourcing approach, yeah. AUDIENCE: Oh. Then do you actually do
something [INAUDIBLE]? MOUSTAFA YOUSSEF: Yes. So we have different
kinds of rankers. Each ranker is
based on one sensor. And the user feedback
module, what it does is that it tells me which
ranker should be trusted more than the other rankers. So as I showed
here, for example, Wi-Fi gives better performance
compared to localization and then by image. So what we saw in the
user feedback module is that this is the actual
weight we get using– and you can see
that it gives Wi-Fi better ranking than
localization and so on. Based on the user feedback,
since the user is telling me I’m currently in
Starbucks, the ranker that ranks Starbucks higher
should give higher weight. This is globally, actually. One of the extensions
we are working on now is to do it per user. So maybe different users have
better rankers in terms of, this is a very good point. Exactly. Or per venue. It’s per user or per
venue, same thing. Very good point. Great. Finally, for the semantic
leveling, what we did here is that we induced
artificial error in the check-in operation. All the check-ins are performed
by our 20 participants. And they were correct,
so what we did is we induced erroneous
check-ins from 0% up to 100% check-in error. And we are comparing
CheckInside, which is the green
curve, to two extremes. One of them is no detection
if I don’t do any outlier detection, which is
the purple curve. And the other is
the orange curve. It’s the one that knows
which check-ins are correct and which check-ins are
incorrect in a global manner. And we can see that for
different percentages of check-in errors,
the CheckInside can provide from 9%
to 19% enhancements of the semantic detection
of the floorplan. I would like to note here
that even if you don’t have any check-in errors, you
still have the 83% accuracy. And this is due to the inherent
indoor localization error. This error is not because
of the fake check-ins, but it’s due to the inherent
inaccuracy of the location determination technology. So in the next part, I’ll
evaluate the different modes of operation. So the CheckInside
curve is the orange one. It’s the system that was using
all the sensors in the phone. The green one is the
privacy enhancing mode, where we don’t use
any mic or image data. And the purple curve, if I
use just indoor localization. I don’t use any other sensors. So this is the simplest
idea– just use a more accurate indoor
localization technique to detect where the
user is standing. And we can find that using the
different sensors in the phone, CheckInside can provide about
55% enhancement in the ranking of the actual, or
the top, venue, compared to the
location-only information. So just using
location-only, you are losing a lot of possibilities
to enhance your accuracy. Finally, when we compare
CheckInside to Foursquare in terms of accuracy
and coverage, as we started in our study. This figure compares
orange curve to Foursquare in terms of
ranking the actual venue. And we can see that the
median rank in our system we can rank as a top
venue in the system, which is at one, at 83% as we showed. While at Foursquare,
it’s about 5%. So the venue the
user is standing at is in the top place for
Foursquare only 5% of the time compared to 83% of
the time in our case. The lower figure shows
the distance error. We showed before that the median
distance error for Foursquare is 84 meters. In our case, it’s
about five meters. This is a significant
enhancement in the performance of
location-based social networks. In terms of
coverage, what we did is that we took the
fingerprints of the modes. We got them started in one mall. And we tried to see if we
can use this fingerprint to detect venues or
shops in our other malls. So we used the
fingerprint from one mall and tried to detect the
venues in another mall. And using this– of course, it
depends on the type of venue, but overall we can increase the
coverage or detect about 25% more venues than the venues
stored in the Foursquare database without any prior
calibration in the new mall, based on the fingerprints
in the other malls. Sure. So all the fingerprints,
which is– exactly. Wi-Fi, of course, changes from
one place and the location, of course. Wi-Fi and the location. So it’s mainly image, mic,
color, and mobility data. Very good observation. Moving on. So in summary, I showed
you the CheckInside systems that leverage
crowd-sourcing information to implement a fine-grained
location-based social network. And the idea is to construct
a semantic fingerprint in an automatic way for
the different venues. And this semantic
fingerprint helps us to more accurately rank
the different locations. We showed that we can
achieve the accurate ranking within 95% of the time
within the top five locations and at the same
increase the coverage of indoor location-based
social networks by 25% without any explicit
help from the user. And as a side benefit,
we could semantically label the floorplan
obtain automatically semantically rich floorplans. Currently, we are expanding the
system in different directions, including better semantic
floorplan labeling through better outlier
detection techniques, also with other
semantic granularities. So if I have fingerprints
for different restaurants, can I get a higher
level fingerprint that characterizes what
are restaurants in general. And then I can use this
to identify restaurants in other malls or other venues. Another thing is to
have other applications. For example, instead of having
a check-in list for users to choose from, I can– since
I have accurate estimation of the user venue–
I can perform the check-in through the maps. So the user clicks on the
map where she’s standing. And finally, like any other
crowd-sourcing approach, energy and privacy
have still and a lot of enhancements that can be
performed within this domain. This concludes my talk. And if you need more
information about our papers, media coverage, incoming demo
and possible commercialization, please go to our website. Thank you very much. [APPLAUSE] Questions, please. AUDIENCE: Can you just
help me understand how you recruit people for this? Could you just
answer which city? Because sometimes interface
is very, very [INAUDIBLE]. So Foursquare in Downtown
Manhattan versus Foursquare in some other places might
not be quite so good. MOUSTAFA YOUSSEF:
Very good question. So all of this data
is collected in Egypt. It’s for malls– two in
Alexandria, two in Cairo, which are the two
biggest cities in Egypt. Participants mainly
were students working with [INAUDIBLE]
and difference. So currently, we are trying
it also in other countries. So we are talking about
it with people in the US and people in the region,
like Saudi Arabia, to start deploying this
and collecting data to increase the
size of the study. Yes? AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Exactly. AUDIENCE: Exactly
Very good question. The question is, is this
based on crowd-sourcing? Crowd-sourcing information
keeps changing with time. So currently, of
course, the results are based on the static
data we collected. But the way we envision
it is that we’ll have a continuous stream
of data that is coming. And maybe, take for example, the
latest one month or two months data to extract or
update your fingerprint. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Yes. So actually, it’s–
you can do all of this. You can have your fingerprint
as a function of data. What we are doing currently
is that for mobility data, for example, you have
the histogram based on the time of day. So you can say that this
particular location, this is a histogram of visits. The sound is taken
all over the duration. It may be of the day. And maybe this is why,
for example, some rankers perform worse. If I take this into account,
the performance of the ranker can become better, for example. Does that answer your question? AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF:
No, it’s great. AUDIENCE: One question was,
it seems that between Wi-Fi and location, if we go back
to the performance stat, they encountered the most of
the performance improvements. MOUSTAFA YOUSSEF: Right. AUDIENCE: The other one had
an incremental value, mainly helping the position of two–
like the second and third– but they didn’t
really help as much. MOUSTAFA YOUSSEF: Right. AUDIENCE: It’s a ton of effort. Do you think it’s worth
it for a small effort? MOUSTAFA YOUSSEF: So the
point that he’s raising is most of the
enhancement is coming from the Wi-Fi and
localization modules, and the other is a slight
enhancement on top of this. So what we can see
from this is that, yes. Wi-Fi maybe gives
the best performance in ranking the first one. However, in ranking
the second and third, the overall performance
is significantly better if I include other sensors. So I think this is an
application designer choice. Do you need it really? One can say I can live with
83% performance in detecting. And using just Wi-Fi,
I can get about 65. Do I need to go from 65 to 83
with the other modules or not? So it depends on
what you need to do. Actually, one of the feedback
we got from the Ubicomp acting thesis is
that you are saying, I can achieve this
very high accuracy. Can I automatically check-in the
user if I have this accuracy? So again, if you say I can live
with 17% error, that’s fine. If my application
is very sensitive. I don’t want to do this. Then I need to take the
other sensors into account. Please. AUDIENCE: Hi. This is [INAUDIBLE] the
other role of a [INAUDIBLE] because of [INAUDIBLE]. MOUSTAFA YOUSSEF:
Yes, so this is the difference with
feedback module. So as we said, the
feedback module increases it from 70% to 83%. So this is without
the feedback module. Please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Good question. I don’t have it from
a the top my file, but I think it wasn’t that much. Because in order to
collect these 711 venues, they had to move quickly
within the venues. So I imagine it would be in the
order of a few minutes– one or two minutes per location to
collect the Wi-Fi fingerprint and take the images. This is included all in this
few minutes environment. Maybe you can find more
details inside the paper. Please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: I
think it’s the way. So the question is, we
mentioned that the median error in Foursquare is 84 meters. So the question is– and
this is the difference between the center of mass
of where the user is actually standing and the center of
mass of the top ranked venue. So we believe this
is related to how Foursquare ranks
its ranked list. And we believe that– of course,
we don’t know information about how they actually
do it, but mainly we found that it’s
based on popularity. So the most popular
locations are placed before the
other locations. And if you do this, even
though your accuracy of Wi-Fi may be about 20 or 30 meters,
but the way you rank your list is incorrect so that the
actual rank is farther away from the actual location. Some of it also is related
to that, as we mentioned, 60% if you turn on Wi-Fi or
22% if you turn off Wi-Fi. Locations from
outdoor are returned and this can skew the
detail of the distribution and make your evaluation errors. Another question here? On the side. Yes, please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: So
actually we take– I don’t have the term
off the top off my head, but if you take one
minute, what you do is you do continuous
scans, and then take the percentage, how many times
you hit that particular access point within this
one-minute period. And you take this
as your fingerprint. No signal strength,
and the reason for that is the phone heterogeneity. Different phones can give us
different RS fingerprints, and so it will not be robust. But however, this percentage of
time I hear a specific access point should be independent
of the phone hardware. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: So it’s how
you compare to signal stance? No, sorry. Not signal. To fingerprints, to features. So if I’m standing
at one location, I’m currently hearing
some sort of access point. If I am collecting
data in my venues, how to compare these two venues. So you mean it’s based
on signal strengths? So if it’s just you
base it on the– so you have this feature, which
is a percentage of time. You take it as a number,
and there is another number. And you then apply
the similarity method. Sure. AUDIENCE: What about
the size of it? MOUSTAFA YOUSSEF: Size of–? AUDIENCE: The size
“Size” meaning–? AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: It’s typical. So we have large, large, large
shops, like Macy’s and so on. And you have small shops. So it’s varying form,
of course, Wi-Fi. That’s a very good question. Wi-Fi, of course, will vary. And that’s why
maybe in some cases Wi-Fi performs more
poorly than localization. That’s why you
need other sensors. So in large shops, like
a big department store, for example, the
Wi-Fi fingerprint, you can either split
it into smaller shops. But here we didn’t do that. We just took the entire
shop as a fingerprint. This is what we are
doing with this. But maybe as one
extension, you can split it as other smaller venues. Please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Good question. If I understand the
question correctly, you mean how many venues were in
the four malls in total, or–? AUDIENCE: No. How you would actually
So we’re saying we have 711 over the
entire four malls. And out of these 711,
there was about 436 already in the
Foursquare database. The remaining about 300 were
not in the Foursquare database. And this is the
filtering and ranking performing over these 711. Of course, if I
am in a mall, I’m just working on this
specific mall date. So roughly you can
divide by 4 to get to the number of venues
Of course, that’s why all the results are
based on these specific four malls and this
specific configuration. So I’m trying to generalize
from these particular malls. It could interesting if you have
more data to share it with us, and we can work
on it and see how it will perform
in a larger scale. That will be great. Robin? AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF:
So the question is related to energy
consumption concerns. How do you collect the data
you ship to your server? Do you do it all the time? You are running the
sensors all the time? In this case, it will
be very energy hungry. Or you doing it just when
the user clicks a check-in. And for this, we use this fixed
venue determination module. And the reason
for this module is to tell us that the user
is currently in one venue, and he is stationary
within this venue. So what happens is that
once you’ve detected a user is stationary
within a venue, using Wi-Fi similarity
and the initial sensors, you start collecting sensors in
the background within a fixed window so it’s updated. So this balances
these two extremes. You are not collecting
data all the time. You are collecting only when it
senses that you are stationary. And you are not collecting
just when the user clicks, because you need
some history in order to do these scans and
said, this is currently how we do it as an initial
approach to this problem. However, I listed it
in the future work, because it’s a very important
and very critical thing. Otherwise, people will
not use your system. If you are collecting
data all the time and the battery
is being drained, it will be used– they will
uninstall your application. Another question, please. AUDIENCE: [INAUDIBLE]
mentioned that 700 are from four
crowd-source [INAUDIBLE] and that Foursquare
overlaps 400 of them. MOUSTAFA YOUSSEF: Yes. AUDIENCE: So do you actually,
for that geographic area, do you know what the surface of
[INAUDIBLE] So I’m wondering, is it the same order
of magnitude as places, or is it actually much higher? MOUSTAFA YOUSSEF: I think it was
proportional to the same amount of being venues that were
in the Foursquare database, if I understand your
question correctly. So it was proportional– AUDIENCE: So you had around
400, 500, that [INAUDIBLE] is considering to
populate their list. MOUSTAFA YOUSSEF: Yes. So, actually, yes. Exactly. And that’s why
actually we get– I think this is all
in the database. I’m not sure about this,
because we calculated coverage. So we said that it
covers– 39% are uncovered. So in order to do
this, it has to be, we are covering all venues
in the Foursquare database. This is what is actually
in the Foursquare database. This is a complete list in
the Foursquare database. Ahmed? AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: No. So the question is related to
privacy and energy consumption. In order to balance
this, what you can do is to do processing on the
phone or a nearby laptop, for example. When I was saying a nearby
laptop, it’s your own laptop. So if you are working,
maybe it offloads the data to your laptop, does the
processing there, and then sends the data over
from your laptop. You can think about
sometimes is stuff like homomorphic
encryption and doing data, but it is not mature. One thing here is
that our focus mainly was on defining the
accuracy of the coverage of the location-based
social networks and how we can
enhance the accuracy. Privacy and energy, of
course, is one main concern in order of something
that is running. And I think it is one
of the main branches we need to focus on. AUDIENCE: Help me
understand here, so I can– so was this organic
in the sense that people were just going about
their normal daily lives, or was this almost like a
survey of like, is that OK? Is anybody going to go
to those older stores? And we’re paid to
work [INAUDIBLE] and look at how we perform on
the data for the week later. And what do we figure out? Can you actually parse it
out just so I understand it? MOUSTAFA YOUSSEF: Sure. So the details, I’ll say
roughly from the top of my head, but the details
are in the paper. So, basically what
we did is I think we split the participants, the
20 participants, into groups. And each of group was asked
to visit different venues at different times so
that we’re covering different days of the week and
different durations of the day. And they were requested
to do the check-ins as they do naturally,
in their way. Whether they did so or
not, we are not sure. But we hope that the mix
of different participants and the different splitting
into different groups and covering different ways
make it more realistic. Of course, the best thing, if
we could have some data already. So what we get
from check-ins, it doesn’t include the actually
location as ground tools or so on. So if we can get this
kind of information, we can do a more
realistic study. But this is what we could do
through the resources we had. AUDIENCE: And this to be
about– that’s not what it is. Like, you used different
values in order to–? MOUSTAFA YOUSSEF:
This is combined. It’s all of the data we
collected in all locations. And that’s maybe related
to one of Robin’s questions is, maybe the fingerprints
change by the time of the day. So we didn’t do this. We just collected it over the
day, and got our fingerprint. If we do that, maybe we
can enhance the performance of different rankers,
and the order of rankers may change based on this. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF:
We just collected it as one pool of data
and we treated it. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Yes, exactly. Exactly. Or maybe we used
cross validation. It’s not on the top
of my head, but I need to get back to
the paper to make sure. Did we do this cross validation,
or just used the same data? But you’ll find the details
Right, but if you do it, that may be one– for at least
for the semantic labeling thing, we did this. However, if you want to
use the location and Wi-Fi, it has to be within
the same mall. But at least for
the semantic, we used a fingerprint in one mall
and compared it to other malls. And that’s how we got this
25% enhancement in coverage. Please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: That’s
actually a good question. My answer was going
to be maybe this is the result of
random selection. But this is a valid point. Actually, maybe I need to look
into it and see whether really if a random selection
gives the same– AUDIENCE: You said you have
research based on the location? MOUSTAFA YOUSSEF: Yes. AUDIENCE: Does that
happen before this? MOUSTAFA YOUSSEF: Yes, it does. So actually– yes. AUDIENCE: OK, so it– MOUSTAFA YOUSSEF: Good point. So you filter by location, and
this filter gives you 10 lists. And then you do a relative
ranking within 10 lists. So it’s 10% without
But it may be a lot like what Brian is saying. It’s 10%. If you have 10
locations, it’s random. You rank it– exactly. So that’s why it’s
So, again, it’s related to our UnLoc paper. And the way we do it is you
take a crowd-sourcing approach. So what we do is that people,
during their normal operation, the phone is collecting data
and sending it to our servers. And what we do is,
through dead-reckoning, you have an estimate
of the user location. It is not accurate. So what we do is we
reset this error. It’s a chicken-egg problem. You use anchors to
enhance you location, and your location is used to
estimate the anchor locations. So it fits, actually, the
semantic SLAM problem. And this is our recent
paper about enhancing UnLoc using the SLAM technique,
the semantic SLAM paper. Maybe I shared it
with the group before. Yeah? Please. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: So
we have as an input is the floorplan of venues of
interest without any labels. It’s just our clients
or the borders of the different venues. And based on this, we do this
walking distance estimation, automatically labeling
with some of the modules. We have another
piece of work, which is the CrowdInside paper,
where we are automatically estimating the floorplan shape. So if you combine these two
pieces of work, what you can do is you can get the floorplan, at
least the borders of the rooms and the corridors, in a
crowd-sourcing approach also. So, here, our focus wasn’t on
the floorplan construction. It was on the location-based
social network. But if you combine these
two pieces together, you can do it in a
completely automatic way. Of course, it will have
an effect on the accuracy, but you need to
balance the two things. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: That’s
a hard question, yeah. So even in our
CrowdInside system, what we did is that we
evaluated it in a small– it was in our campus. It was a small building. You could get 100% accuracy
if you had about 300 cases from all users that
use the building. However, we didn’t test
it in a large mall. If you do it in a large
mall, usually the accuracy would be lower, and there
would be more challenges. But I cannot give a number
now from the top of my head. Hm? AUDIENCE: I’m really
interested in the way that you combined all
these groups of data. I like that you did something
people seem to like. You can almost go up and
just thank them and then just pad out the rest. MOUSTAFA YOUSSEF: Right. AUDIENCE: But that also
equals some really interesting possibilities around
traditional [INAUDIBLE]. I think there are some
situations where the Wi-Fi is ambiguous, but the location
is– or the Wi-Fi is ambiguous, and the images are
My question is, how much error do
you think there is? How much better do
you think you could do if you did more
intelligent combinations of these simple steps? MOUSTAFA YOUSSEF: Actually, we
tried the score-based method, but our issue was that
there is a normalization between the different rankers. The range of score here is
different from the range of score here. Even if we did
[INAUDIBLE] normalization, the order-based technique was
better in all cases we tried. But maybe if we try
other techniques for fusing these
different scores together, maybe we can get– Our intuition at the beginning
is that if you use a score, it’s more information. You should get better results. But it didn’t work out,
at least in the functions. We tried different functions
than we tried here. Maybe also the user feedback
module can help in this. So it can tell us that if we
fix it per venue or per user, it can tell us for
this specific venue, mostly Wi-Fi is
better than location. So if we take this particular
profile-based waiting, maybe you can also handle
these particular points. Robin? AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Yes. AUDIENCE: I think that
it’s a valid point. Of course, in our study,
we are collecting all data, because the participants were– AUDIENCE: Oh, they
were students. MOUSTAFA YOUSSEF: Exactly,
students and their friends. But, in reality,
how would it work? This is a nice observation. From the references
we refer to, it’s 79% of the time it’s enabled. But it’s a nice question. We are here at Google. So what percentage of– AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: All right. I think this is a
very valid question. And I think maybe
one of the figures here may provide
a hint about it. So here, for example,
we are saying if you have
location-only as compared to if you have all the
sensor information. So this is still
better than Foursquare without using any
indoor localization, but you have this extra 54%. So I think that’s the
tradeoff you’re talking about. If you move more
participation from the user, you are moving from this
scale to this scale– maybe. AUDIENCE: [INAUDIBLE]
none of us is [INAUDIBLE] MOUSTAFA YOUSSEF: So I think
this is the green curve. So, actually, the green
curve is privacy concerns, which is without the
mic and the images. AUDIENCE: Without mic? MOUSTAFA YOUSSEF: Yes. So without it, you are losing–
it’s still better than location using the other sensors,
but you’re still not getting the full
potential of the system. Of course, this is a very
interesting direction to look into. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: And for a
good cause, not a commercial. Not a commercial. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: All right. At least from the
figures, at least for the user feedback module
that we have this figure. We’re talking about
30 check-ins in order to stabilize the performance
of the user feedback module. So I don’t think it’s
much, and if you’re talking about
crowd-sourcing sensor data from thousands or
millions of users, you can get this quickly
within a couple of days, if not a couple of hours. So I think it shouldn’t be much. But the question
is– and that’s why Google have any statistics that
can share how many people share their data or don’t
care about it that much? I’m not sure if they can
share this with us or not. OK, but at least you have an
idea of what’s going on there. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Overall. Overall. This is for the user
feedback module. The user feedback module. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: It’s
the weight, not– yeah. To get the actual weights
over the entire malls, for the four different
malls we used in our data. AUDIENCE: [INAUDIBLE] MOUSTAFA YOUSSEF: Yeah, I
wonder if you take into account the number of users that use the
mall and the percentage of them participating in
your system, I think you can get quickly saturated
with the performance you are looking into. Hard question, great question. All are great questions. Yes? Thank you very much. [APPLAUSE]

3 Replies to “CheckInside: A Fine-grained Indoor Location-based Social Network”

  1. Without cc the video would be unwatchable. The pace is too fast the presentation transition is too choppy. It's a shame the info is very very good. I keep hearing chickens instead of check-in. It is very distracting.

Leave a Reply

Your email address will not be published. Required fields are marked *