2013年5月19日日曜日

www2013五日目


www2013最終日メモ.
帰国便の都合でKeyNoteしか聞けなかった.
Kleinbergの話だったのでそこそこ満足.
一番面白いKeynoteだったかも.

ちなみに,Keynoteの前に来年のwww2014の宣伝が流れた.
来年は韓国ソウル.
一番遠い国から一番近い国へ.
日帰りも出来ちゃうんじゃない疑惑w

Keynote
Jon Kleinberg

Computational Perspective on Social Phenomena in On-Line Networks

Web is metaphors of library and Crowd
Core Question Combining content and structure
Basic question
- What features of a message help predict its level of penetration
- How can network structure help untderstanding contents?

Quoted Phrases The 2008 election news cycle
 - many phrases
Why do certain quotes stand out?
Movie Quotes as Viral Text

Algorithm Recognition of Memorability
- Less probable in their word choice
 - Compare to a base language model trained on newswire
 - Not just individual word, but consecutive 2- and 3-word sequence
 - E.g. "You had me at hello"
- But more probable in their part-of-speech composition
 -" You had me at hello" has same part-of-speech sequence as "I met him in boston".
- A memorable quote : a sequence of unusual words built on a scaffolding of common part-of-speech paterns

Connective Media and Social Feedback Effect
IN aggregate memorable quotes are more general
- suggest a certain probability to the quote

Meme mutation
mutation of textual mems as thery trabel from source to source
genetic analogy for mems beginning of a formalization
- fitness function
- mutation mechanism and functional element
- population structure

Decisions in SocialMedia
interface with the network plays a role in your decisions
- User-defined groups 
participate in online collaborative project
decision to use Twitter Hashtag
Click on a product ad endorsed 

Diffusion Curve
Long standing framework
proability of adopting a behavior depends on number of network neighbors already adopting

Key issue; qualitative shape of the curves
Diminishing returns? Critical Mass?

Predicgtion and Potential Influence
Likly to do something when more friends are doing it, Why?
homophily / selection vs influence

Structural Diversity
Dependence on number of friends :
  a first step toward general prediction
- Given the full pattern of connections among your friends, estimate probability of adopting a new behavior
Structural diversity
 - using mot-if

Appilcation to cnoversational curation
Many sites organized around evolvind discussion threads
Basic problem : Algorithmic curation
Key sub-problem : Length prediction

Final Reflections: Toward a model of You
Not only massive population, data of indiviidually
Software that understands your behavior better then you do
- Ex:How rapidly do you reply to email?

The web is powered by feedback loop between people and information


===================
Keynoteのあと,すぐに空港へ向かったんだけど
結局飛行機の出発は3時間遅れ.
その理由が「搭乗員が空港に着くのが遅れたため」
そんな理由もあるのか!
さすがブラジル・・・







2013年5月17日金曜日

www2013四日目


www2013四日目

相変わらず意味のないメモ.

KeyNote
Computing with Brain Circuits
BMI

猿がロボットを動かす?
BrainToBrain Interface

Panel: “Net neutrality and Internet freedom”
Tim Berners-Lee, Professor at the Massachusetts Institute of Technology (USA), Director of the World Wide Web Foundation and President of the UK’s Open Data Institute
Dep. Alessandro Molon, Brazilian Congressman, Leading the Bill on Internet Regulatory Framework

PrivacyとFreedomの問題
ポルトガル語でよく分からん

DRMはEvilか?
HTMLがFreeなのとすべてがFereなのとは異なる

OSN Analysis and Characterization

Wisdom in the Social Crowd: an Analysis of Quora

Asking questions on the internet
  Google, Wikipedia, Online Q&A Service
Quora: Social Q&A

Analyze 3 Graphs
User, Topics, Questoin
- Iser tp@oc gra@j
- Social Graph
- Related question graph

More answer, high quality answer => more follower
Social Ties affect answer? => No

Impact of question degree
User attention on similar questoin
Do user pay attention to similar question?
 


Modeling/Predicting the Evolution of User Activity Graphs on OSN-based Applications

Studying user activity
Friendship graph: static & symmetric ... but user activity is dynamic & asymmetric
=> evolution of activity graph
Emprical characterization of user interactoins
- A dynamic graph evolution model

Node obehavior characteristics
- Active
- Alive
- Quit
Active Rate
- increases time by time
Expansion Cost
- Active nodes/ new nodes
- 活性化するにはどのくらいのノードが必要なのか
- Explain how application die out
Life-Span Distribution

Active user number prediction
- parameter estimation based on 20 weeks
- Active user number prediction for the following weeks


Google+ or Google-? Dissecting the Evolution of the New OSN in its First Year

新しいOnlineSocialNetworkは顧客を獲得できるのか?
G+はこの疑問に答える対象となる
G+="Ghost Town?"

Measurement Methodology & Data Set
めっちゃクロールしてデータを集めた

LCC in G+ => 43.5 => small...
LCC=Largest Connected Component

ActiveUsers=>Steadly increasing.
Campaign 意味がある

- 10% of users post 80% of post
- 1% of users comment 90%

power-low friend distribution

G+ growing rapidly
LCC active users steadily grow
=> Growing!!


Potential Networks, Contagious Communities, and Understanding Social Network Structure

Network properties
- Degree Distribution
- Shrinking diameter and edge densification
- Network Community Profile

Contagious Networks
- network spread via existing social ties

Model of Community growth
- 論文をチェック

Graph Analysis and Primitives

Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections

subgraph frequencies = mot-if?

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

- Global Clustering Coefficient
- Network average CC
- Number of registered user

Global CC Algorithm
C_g = Φ/Ψ
Ψ=Average Degree-1
Φ=聞き逃した・・・

Evaluatoin:あまり精度が良くない・・・
う~ん?思ったより面白くなかった
こんなんでwww通るものなの?


============
と言ったところで終了して皆で飲みに.
シュラスコの店に行ったけど,笑いすぎ+食べ過ぎて腹が痛い.
ブラジル面白い.
もう一度来るかどうかは微妙だけどw

2013年5月16日木曜日

www2013三日目


www2013も三日目.
今日から本体がスタート.

すべてのセッションが午後からというのは,完全に午前中は遊びに行けよ,という意味なんだと思う.
恐るべしwww,恐るべしブラジル.


Keynote 
Luis Von Ahn
Duolingo translatoin
human computation
CAPTCHERを作った人
人間型の目的のためにやることを使って何か目的を実現する.
仕掛け学に通じる考え方なのか,流行の考え方なのか.
面白かったけど,しゃべりが早すぎてついて行けない部分多数.

Social web engineering

Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What to Do

クラウドソーシング
The pull methodology is suboptimal
Push Methodology : Task-to-Worker Recommender

Pick-A-Crowd : A system architecture that uses Task-to-Worder
タスクとユーザのマッチング
FacebookのLikeを使ってWorkerをRanking

Likeが多いカテゴリについてはWorker Precisionが高い傾向にある

”Worker knows what they like”

Groundhog Day: Near-Duplicate Detection on Twitter

Twitter上には似たような情報が多い.= Duplicate Content
類似記事を発見することを目指す
1. Exact copy
2. Nearly exact copy
    completely similar expect hashtag and url
3. Strong near duplicate
    Same core message
4. Weak near-dupulicate
   Same core message, one twet contains personal views
   Convey semantically same, differing information
5. Low Overlap
   Semantically same

- 20% duplicates in the search results

Classification strategies => different feature combnation
- Accurately of detecting duplicates
- What kind of ffeatures are important to detect

49% precision, 45% recall
67,63% to find duplicate levels

Reactive Crowdsourcing

Crowd Control
Goal: taming the crowd 
- cost
- time
- quality
- motivation
クラウドソーシングにはあんまり興味がなかったので,メモが途中で消えている・・・

Spatio-Temporal Dynamics of Global Social Media

Challenges and opportunities
What are the geo-properties of ideas?
Using HashTag=>なぜHashtag・・・

Collect geo-tagged tweets with hashtags
2 billion data

1. Location properties of hashtags
2. Propagation of hashtags
3. Spatial analytics of locations

Hashtag distance vs sharing
遠いほど同じHashtagは使われなくなる

Hashtagのエントロピを利用
エントロピが低ければ色々な場所に分散して存在しているはず
ただし,「様々な場所」が近いかどうかは分からない
Distanceを入れる.
Hashtag's spread

s = 1/o\sigma D(o, G(O^h))
25% of hashtag have spread > 1000 miles

都市によってIdeaImpactは異なる

日本の都市でも違うのかしら?
GeoDataを分析してみたくなる

Industry Track 2
Big Data and applications on the web

眠気と英語力で早々に敗退.

Behavioral Analysis
HeteroMF: Recommendation in Heterogeneous Information Networks using Context Dependent Factor Models

U: user
I: item
R: UxI
estimate rating R

Require enough rating from users
50% of users are cold start
Use additional source of information

Collective Matrix Factorization
HeteroMF

When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question 

- Many users are new
- Users want to answer new questions
- Goal: realtime question recommendatoin for all user types
  - Item and user cold-start scenarios

User, questions mapped to same space
questions assigned to other

Question profile:
- posting time
- LDA, Lexical, Category
- Feature space split according to the top category of Yahoo!Answer

User profile:
- probability tree
- constracted from the questions answerd by the user

Question Recommendatoin
- Rank question profiles by similarity to the user profile
- Use IR-like approach for fast large scale ranking
- User profile contains several dozen weighted search terms

Online Experiment
- A/B test ( control/bucket)
- Fail => add freshness & diversity
- With freshness & diversity => 回答が増加

これって,FreshnessとDiversityが重要だったって事ではないのか?


Questions about Questions: An Empirical Analysis of Information Needs on Twitter

Starting ask question on FB, Twitter
Social networks are not designed for information seeking

Information Need Detector!
- Understanding & Labeling
- Feature selectoin & Boosting

Information Needs Detection Experiment

INについて調べているのかと思ったら,いつの間にか
トレンドとか,バーストの話になっていた.
どういうことだろう?
HotTopicAnalysisもInformatoinNeedsで可能ということ?
この辺は論文をチェックした方が良さそうだ


====
1万円しか両替しなかったら,残高がやばい.
なんとかしないと・・・

www2013二日目


2日目の午前中はチュートリアルに出たけど,面白くなかったので,パス.
午後は,自分が発表するSWDM'13に参加.

SWDM'13 Workshop


Keynote
Disasters Response Using Social Life Networks
Ramesh Jain

Check this paper
Insight => Information
Social Web and Maslow's Hierarchy:
  
Fundamental Problem
- Connecting people to Resources
 - Effectively, efficiently, and promptly in given situations

Social Life Networks = Connecting people to resources

Aggregation and composition

Disaster Management Cycle
relief response recovery rebuilding prevention mitigation preparedness

Understanding needs and availability of resources is critical

Information is the key
- Data integration
- location analysis
- and so on

Situation mapping
Resource Management
Communication and Reporting

* Micro level data, and Macro level data

Social Structures:
  Person - Organizations - Human Society
From micro => macro

Data Ingestion
 Actionable Insights and Recommendations
Stream Processing engnie
 bridge high level concept of situation and low level data stream

E-mage data representation
 spatio-temporal element
 
Eventshop : Motivation
- Billions of data sources
- Environment for selecting and Combining
- appropriate sources to detect situation
- predict pro-active actions
- Interactions with different types of Uesrs
  Decision makers Intidviduals

# そんなシステムを作る
# dataからSituationを予測するのかな。これはやりたいことにも通じるものかも

Situation Recognition Algebra
What we have are based on our current experience. 
Will be enriched.

# For Disaster Managementという観点は面白い

Social life network for Disaster Management

SOcial web has been useful in 
- dissminating information
- situation awareness
- people finding
- warnings

All users are not equal
- How can we make personalized alerts?
- Turning Disassociated Data into Meaningful Information

LIMITATINOS
- Focus on being broad platform
- Informatoin must be extracted from limited text
- Difficult to extract Signal
- LOW Signal-to-Noise ratio

Tweeting applications ustbe solution
- develop focused micro blogs
- Get all information from 'motivated' and collaborative user
- Help them solve their problem

LifeSaver App
- Preditive analystics operators in eventshop
- interactions with data warehouses databses information systems
- building personal context in the form of personas and using it
- making eventshop available as open source
- developing livesaver app

# Persona 作成は面白そう。場所推定を含めて、RTの関係とかから予測できないかしら

Eventshopというアプリケーションを中心に色々やっているようで,面白そう.
講演の後話をしたので,メールでコンタクトを取ってみよう.

 A Sensitive Twitter Earthquake Detector
Bella Robinson, Robert Power and Mark Cameron

Identifying earthquakes via global network of seismic stations
ESA Platform
ESA Burst Detection

Geo-location of Tweets is not easy:1% geo tags
Use Yahoo's Geoplanet service to detect user's location: 70%

特に目新しい感じはしない

Delay: 3:03 (max 5:34)
別に,ソーシャルでやる必要はないんじゃないかと.

Text vs. Images: On the Viability of Social Media to Assess Earthquake Damage
Yuan Liang, James Caverlee and John Mander

ソーシャルメディアデータからダメージは予測可能か?
TextとImageのどちらがダメージの予測に適してるのか

35%のツイートがURLを含む
 
Using Density
Tweet density
Retweet density
Number of user tweet

Spread Speed
Text spread constant
Media spread slow and Chaotic

Intensity Attenuation
Log RT densityは距離によってLogで減衰

減衰の様子はなかなか面白い.

Comparing Web Feeds and Tweets for Emergency Management
Robert Power, Bella Robinson and Catherine Wise. 

Australian again
Emergency Response Intelligence Capability : ERIC

What information is there on Twitter about an event
Where is...

Compare Official data and Tweet data

Emergency service agencies provides userful information in different ways
Tweet from official sources:
 - are reported sooner
 - contains specific information
 - detailed
 - updated frequency
 - include information from public
compare to personal tweets

Practical Extraction of Disaster-Relevant Information from Social Media
Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier. 

Categorised Sandy tweets 
- personal
- informative
 - caution and advise
 - casualties and damage 
 - donations

Extract information that contribute to situational awareness
Filtering, Clustering, Extracting

Filtering
- Is disaster related?
- Contributes to situational awareness?
class: 
- personal (also for friends)
- informative (for many people)
- others
Filter personal and others
Using WEKA

Classification
- Caution & advise
- Casualties & Damage
- Donations
- people
- information sources
- others
Using WEKA

Extraction
- Using crowdsourcing
Automatic Extractor
- Using CMU ARK Twitter NLP
  - UsingCRF

70-80%位の精度で成功
Many messages contribute to situational awareness
IE can speed up management
Various informative classes
We can use Machine Learning for all 3 tasks


Location-Based Insights from the Social Web
Yohei Ikawa, Maja Vukovic, Jakob Rogstadius and Akiko Murakami

Why twitter for disaster management?
APIs and so on

Inferring Location from the Social Web
Location Types and Use Cases in Disaster Management
- Location in Text
- Focused Locatoins
- User's current Location
- User's Location Profile

Use location data in crisis tracker

Confidence Score is calculated based on 
- Location Popularity
- Region Context
Confidence score is
- Location popularity x Region Context

Information Verification during Natural Disasters
Abdulfatai Popoola, Dmytro Krasnoshtan, Attila Toth, Victor Naroditskiy, Carlos Castillo, Patrick Meier and Iyad Rahwan

Boston Malathon
Social Media vs Mass media
social media で誤報.
SocialMediaは信頼に値するのか?

BBC's user-generated content hub
"Social Media news agency"
Verificatoin of authenticity of photo and video evidence
Time of day, weather, landmarks...
Accents spoken, ambient noises, etc

Truth in social media is a widespread concern

Jounarists verify, verify, verify
Computer Scientist apply ML, IR, NLP
というの言いっぷりが面白かった.

相変わらずメモになっていないメモだけど,あとで論文読めば,まあいいか.
昨日のWSよりは遙かに面白かった.


2013年5月14日火曜日

www2013一日目


リオで行われているwww2013に参加したので,メモ.

初日は13時についてWSに以下の参加.
InvitedTalkが気になったけど,途中からしか参加できず,しかもあまりネットワークダイナミクスではなかったので,割愛.


The Role of Research Leaders on the Evolution of Scientific Communities 

- A group of leaders or influential members are able to affect the dynamics of entire community
Study the dnamics of Scientific communities
- Idenfity leader ships, community core
Data Source: dblp (computer science bibliography)
24 ACM SIGs Condiered => Leaders

Estimates a researchers' importance within a community
coresscore of researcher r to community c at time t is given by
Hindex = #rct
式を聞き逃したので,論文をチェック.

Core memvers are awarded
単にCoreScoreと色々な指標を比べたという感じ.

Resolving Homonymy with Correlation Clustering in Scholarly Digital Libraries. 

同音異義語を発見する
著者名とか

Classify papers by its authorship correctly
類似した内容の論文は同一著者のものが多いだろうと言うことでClustering
paper => point on m-dimensional space 
k-meansクラスタリングのやり方を工夫.
そこそこの出来か?
詳細は論文.

Examining Lists on Twitter to Uncover Relationships between Following, and Membership/Subscription. 

uncover information consumption patterns on twitter
relationship between listing following subscription

Listm following, subscribingの類似性と相違性は?
temporal relatoinshipsは?

DataSet:
907 Twitter users

List vs Following
Curators:
Substantial number of curators who have listed people without following

Subscribing vs Following
Distribution are more skew than L vs F
Curators follow 11%

Listing vs Subscribing
Listing and subscribing are different

L&F is similar

Temporal Analysis
どのようにL,F,Sの関係が移っていくのか分析

面白そうなので,あとで論文をチェックしよう.

Analyzing and Predicting Viral Tweets. 

Detection of Spam Tipping Behaviour on Foursquare. 

体力の限界につき断念

ブラジルは遠い.