研究な日々: www2013三日目

www2013も三日目．

今日から本体がスタート．

すべてのセッションが午後からというのは，完全に午前中は遊びに行けよ，という意味なんだと思う．

恐るべしwww，恐るべしブラジル．

Keynote

Luis Von Ahn

Duolingo translatoin

human computation

CAPTCHERを作った人

人間型の目的のためにやることを使って何か目的を実現する．

仕掛け学に通じる考え方なのか，流行の考え方なのか．

面白かったけど，しゃべりが早すぎてついて行けない部分多数．

Social web engineering

Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What to Do

クラウドソーシング

The pull methodology is suboptimal

Push Methodology : Task-to-Worker Recommender

Pick-A-Crowd : A system architecture that uses Task-to-Worder

タスクとユーザのマッチング

FacebookのLikeを使ってWorkerをRanking

Likeが多いカテゴリについてはWorker Precisionが高い傾向にある

”Worker knows what they like”

Groundhog Day: Near-Duplicate Detection on Twitter

Twitter上には似たような情報が多い．= Duplicate Content

類似記事を発見することを目指す

1. Exact copy

2. Nearly exact copy

completely similar expect hashtag and url

3. Strong near duplicate

Same core message

4. Weak near-dupulicate

Same core message, one twet contains personal views

Convey semantically same, differing information

5. Low Overlap

Semantically same

- 20% duplicates in the search results

Classification strategies => different feature combnation

- Accurately of detecting duplicates

- What kind of ffeatures are important to detect

49% precision, 45% recall

67,63% to find duplicate levels

Reactive Crowdsourcing

Crowd Control

Goal: taming the crowd

- cost

- time

- quality

- motivation

クラウドソーシングにはあんまり興味がなかったので，メモが途中で消えている・・・

Spatio-Temporal Dynamics of Global Social Media

Challenges and opportunities

What are the geo-properties of ideas?

Using HashTag=>なぜHashtag・・・

Collect geo-tagged tweets with hashtags

2 billion data

1. Location properties of hashtags

2. Propagation of hashtags

3. Spatial analytics of locations

Hashtag distance vs sharing

遠いほど同じHashtagは使われなくなる

Hashtagのエントロピを利用

エントロピが低ければ色々な場所に分散して存在しているはず

ただし，「様々な場所」が近いかどうかは分からない

Distanceを入れる．

Hashtag's spread

s = 1/o\sigma D(o, G(O^h))

25% of hashtag have spread > 1000 miles

都市によってIdeaImpactは異なる

日本の都市でも違うのかしら？

GeoDataを分析してみたくなる

Industry Track 2

Big Data and applications on the web

眠気と英語力で早々に敗退．

Behavioral Analysis

HeteroMF: Recommendation in Heterogeneous Information Networks using Context Dependent Factor Models

U: user

I: item

R: UxI

estimate rating R

Require enough rating from users

50% of users are cold start

Use additional source of information

Collective Matrix Factorization

HeteroMF

When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question

- Many users are new

- Users want to answer new questions

- Goal: realtime question recommendatoin for all user types

- Item and user cold-start scenarios

User, questions mapped to same space

questions assigned to other

Question profile:

- posting time

- LDA, Lexical, Category

- Feature space split according to the top category of Yahoo!Answer

User profile:

- probability tree

- constracted from the questions answerd by the user

Question Recommendatoin

- Rank question profiles by similarity to the user profile

- Use IR-like approach for fast large scale ranking

- User profile contains several dozen weighted search terms

Online Experiment

- A/B test ( control/bucket)

- Fail => add freshness & diversity

- With freshness & diversity => 回答が増加

これって，FreshnessとDiversityが重要だったって事ではないのか？

Questions about Questions: An Empirical Analysis of Information Needs on Twitter

Starting ask question on FB, Twitter

Social networks are not designed for information seeking

Information Need Detector!

- Understanding & Labeling

- Feature selectoin & Boosting

Information Needs Detection Experiment

INについて調べているのかと思ったら，いつの間にか

トレンドとか，バーストの話になっていた．

どういうことだろう？

HotTopicAnalysisもInformatoinNeedsで可能ということ？

この辺は論文をチェックした方が良さそうだ

====

1万円しか両替しなかったら，残高がやばい．

なんとかしないと・・・

研究な日々

2013年5月16日木曜日

www2013三日目

0 件のコメント:

このBlogについて

このブログを検索

ブログアーカイブ

ラベル

お薦めの書籍

カウンター

マイブログリスト