Title:Twitter Tweet Extended Metadata for Personal Data Lake Storage
A Personal Data Lake is a storage facility which unifies various personal data such as social media posts, photos, bills, certificates, phone calls, texts, files etc. Having all this personal data in one place would be attractive to businesses. The privileges for what information could be accessed and who could access it would be set by the user. The ???data gravity??? of this accumulated data can be supported with third-party plugins allowing data to be queried in a controlled way using purpose-built schemas. My proposal for this project is to focus on one part of the work required to develop a Personal Data Lake - Twitter tweet storage. The essentially unstructured tweet itself is only a small part of the information logged when someone tweets. In order to store the tweets usefully in the Personal Data Lake (i.e. in a way that allows them to be queried) then they must first be analysed and associated with relevant extended metadata. The metadata should describe the content of the tweet but also its context. To analyse tweets, I would look at the words in the body of the message, hashtags, the location of the user, the time etc. Algorithms will be used to establish patterns in the data and ascribe extended metadata to the tweet. The tweets can then be stored usefully with their extended metadata in the Personal Data Lake.
Deliverables: Final report
Student: Steven Arthur
Supervisor: Coral Yan-huang Walker
Moderator: Alia I Abdelmoty
Report: Archive