Show simple item record

dc.contributor.author Devine, Brandon James en
dc.date.accessioned 2014-04-09T22:27:04Z en
dc.date.available 2014-04-09T22:27:04Z en
dc.date.issued 4/9/14 en
dc.identifier.uri http://hdl.handle.net/10211.3/118678 en
dc.description Includes bibliographical references (pages 47-50). en
dc.description.abstract Hashtags are a feature of tweets sometimes utilized in identifying discourse topic and/or user sentiment, presented in the form #thisisahashtag. This method of word concatenation, while a logical response to the 140-character limit per tweet enforced by Twitter, does present certain issues in the context of applying machine learning techniques to derive a tweet's topic or sentiment: since the training data for a tweet are limited, one naturally wishes to utilize said data to the fullest extent possible, but one well-known method for segmenting strings does not necessarily work in the context of hashtags. This potential underperformance stems from a reliance on a static training corpus when generating n-gram probabilities; the ephemeral nature of hashtags that respond to current events and trends indicates instead a need for a dynamically updated training corpus. This thesis proposes a modification of said method that retains its probabilistic power while allowing slang and other neologisms typically found in hashtags to bubble up through the training data at a rate that permits proper segmentation on terms that would otherwise not be recognized. I begin with a review of tweet structure and the algorithm that is to be modified and then discuss certain operational and methodological issues inherent in working with Twitter data. I then describe my algorithm and the techniques utilized in avoiding said issues. I conclude with a discussion of the new algorithm's efficacy and ways in which it might be improved. en
dc.format.extent xi, 88 pages : illustrations en
dc.publisher Arts and Letters en
dc.relation.requires Mode of access: World Wide Web. en
dc.relation.requires System requirements: Adobe Acrobat Reader en
dc.subject.lcc P9.2 en
dc.title A Method of Segmenting Topical Twitter Hashtags en
dc.type Thesis en
dc.date.updated 2014-04-09T22:27:04Z en
dc.language.rfc3066 English en
dc.contributor.department Linguistics and Asian/Middle Eastern Languages en
dc.description.degree Master of Arts (M.A.) San Diego State University, 2014 en
dc.description.discipline Linguistics en
dc.contributor.committeemember Malouf, Robert P en
dc.contributor.committeemember Gawron, Jean Mark en
dc.contributor.committeemember Skupin, Andre en


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Search DSpace


My Account

RSS Feeds