www2013も三日目.
今日から本体がスタート.
すべてのセッションが午後からというのは,完全に午前中は遊びに行けよ,という意味なんだと思う.
恐るべしwww,恐るべしブラジル.
Keynote
Luis Von Ahn
Duolingo translatoin
human computation
CAPTCHERを作った人
人間型の目的のためにやることを使って何か目的を実現する.
仕掛け学に通じる考え方なのか,流行の考え方なのか.
面白かったけど,しゃべりが早すぎてついて行けない部分多数.
Social web engineering
Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What to Do
クラウドソーシング
The pull methodology is suboptimal
Push Methodology : Task-to-Worker Recommender
Pick-A-Crowd : A system architecture that uses Task-to-Worder
タスクとユーザのマッチング
FacebookのLikeを使ってWorkerをRanking
Likeが多いカテゴリについてはWorker Precisionが高い傾向にある
”Worker knows what they like”
Groundhog Day: Near-Duplicate Detection on Twitter
Twitter上には似たような情報が多い.= Duplicate Content
類似記事を発見することを目指す
1. Exact copy
2. Nearly exact copy
completely similar expect hashtag and url
3. Strong near duplicate
Same core message
4. Weak near-dupulicate
Same core message, one twet contains personal views
Convey semantically same, differing information
5. Low Overlap
Semantically same
- 20% duplicates in the search results
Classification strategies => different feature combnation
- Accurately of detecting duplicates
- What kind of ffeatures are important to detect
49% precision, 45% recall
67,63% to find duplicate levels
Reactive Crowdsourcing
Crowd Control
Goal: taming the crowd
- cost
- time
- quality
- motivation
クラウドソーシングにはあんまり興味がなかったので,メモが途中で消えている・・・
Spatio-Temporal Dynamics of Global Social Media
Challenges and opportunities
What are the geo-properties of ideas?
Using HashTag=>なぜHashtag・・・
Collect geo-tagged tweets with hashtags
2 billion data
1. Location properties of hashtags
2. Propagation of hashtags
3. Spatial analytics of locations
Hashtag distance vs sharing
遠いほど同じHashtagは使われなくなる
Hashtagのエントロピを利用
エントロピが低ければ色々な場所に分散して存在しているはず
ただし,「様々な場所」が近いかどうかは分からない
Distanceを入れる.
Hashtag's spread
s = 1/o\sigma D(o, G(O^h))
25% of hashtag have spread > 1000 miles
都市によってIdeaImpactは異なる
日本の都市でも違うのかしら?
GeoDataを分析してみたくなる
Industry Track 2
Big Data and applications on the web
眠気と英語力で早々に敗退.
Behavioral Analysis
HeteroMF: Recommendation in Heterogeneous Information Networks using Context Dependent Factor Models
U: user
I: item
R: UxI
estimate rating R
Require enough rating from users
50% of users are cold start
Use additional source of information
Collective Matrix Factorization
HeteroMF
When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question
- Many users are new
- Users want to answer new questions
- Goal: realtime question recommendatoin for all user types
- Item and user cold-start scenarios
User, questions mapped to same space
questions assigned to other
Question profile:
- posting time
- LDA, Lexical, Category
- Feature space split according to the top category of Yahoo!Answer
User profile:
- probability tree
- constracted from the questions answerd by the user
Question Recommendatoin
- Rank question profiles by similarity to the user profile
- Use IR-like approach for fast large scale ranking
- User profile contains several dozen weighted search terms
Online Experiment
- A/B test ( control/bucket)
- Fail => add freshness & diversity
- With freshness & diversity => 回答が増加
これって,FreshnessとDiversityが重要だったって事ではないのか?
Questions about Questions: An Empirical Analysis of Information Needs on Twitter
Starting ask question on FB, Twitter
Social networks are not designed for information seeking
Information Need Detector!
- Understanding & Labeling
- Feature selectoin & Boosting
Information Needs Detection Experiment
INについて調べているのかと思ったら,いつの間にか
トレンドとか,バーストの話になっていた.
どういうことだろう?
HotTopicAnalysisもInformatoinNeedsで可能ということ?
この辺は論文をチェックした方が良さそうだ
====
1万円しか両替しなかったら,残高がやばい.
なんとかしないと・・・
0 件のコメント:
コメントを投稿