Show simple item record Devine, Brandon James en 2014-04-09T22:27:04Z en 2014-04-09T22:27:04Z en 4/9/14 en
dc.identifier.uri en
dc.description Includes bibliographical references (pages 47-50). en
dc.description.abstract Hashtags are a feature of tweets sometimes utilized in identifying discourse topic and/or user sentiment, presented in the form #thisisahashtag. This method of word concatenation, while a logical response to the 140-character limit per tweet enforced by Twitter, does present certain issues in the context of applying machine learning techniques to derive a tweet's topic or sentiment: since the training data for a tweet are limited, one naturally wishes to utilize said data to the fullest extent possible, but one well-known method for segmenting strings does not necessarily work in the context of hashtags. This potential underperformance stems from a reliance on a static training corpus when generating n-gram probabilities; the ephemeral nature of hashtags that respond to current events and trends indicates instead a need for a dynamically updated training corpus. This thesis proposes a modification of said method that retains its probabilistic power while allowing slang and other neologisms typically found in hashtags to bubble up through the training data at a rate that permits proper segmentation on terms that would otherwise not be recognized. I begin with a review of tweet structure and the algorithm that is to be modified and then discuss certain operational and methodological issues inherent in working with Twitter data. I then describe my algorithm and the techniques utilized in avoiding said issues. I conclude with a discussion of the new algorithm's efficacy and ways in which it might be improved. en
dc.format.extent xi, 88 pages : illustrations en
dc.publisher Arts and Letters en
dc.relation.requires Mode of access: World Wide Web. en
dc.relation.requires System requirements: Adobe Acrobat Reader en
dc.subject.lcc P9.2 en
dc.title A Method of Segmenting Topical Twitter Hashtags en
dc.type Thesis en 2014-04-09T22:27:04Z en
dc.language.rfc3066 English en
dc.contributor.department Linguistics and Asian/Middle Eastern Languages en Master of Arts (M.A.) San Diego State University, 2014 en
dc.description.discipline Linguistics en
dc.contributor.committeemember Malouf, Robert P en
dc.contributor.committeemember Gawron, Jean Mark en
dc.contributor.committeemember Skupin, Andre en

Files in this item


This item appears in the following Collection(s)

Show simple item record

Search DSpace

My Account

RSS Feeds