PUBLISHlab launches website to showcase prototypes
21 Dec. 2020
We are pleased to announce the launch of a new website for PUBLISHlab, our in-house media innovation laboratory that explores how emerging technologies can be used to help news organizations combat misinformation, increase engagement, and boost income.
Built in Flask, the website features number of prototypes that make use of and adapt existing technologies in the name of journalism.These include a content recommender, a sentiment analysis tool, an automatic image tagger, and a publish-to-blockchain tool.
This prototype automatically suggests related news content. The prototype takes in one or more news feeds, which are then parsed using Universal Feed Parser and converted to a pandas dataframe. It then extracts a headline from a randomly selected news story and performs NER (named entity recognition) using a natural language processing library called spaCyand the en-core-sm english language model. NER is the task of identifying named entities, such as people, places, and organizations, in a given text. Finally, it uses the CountVectorizer class and cosine_similarity method of a machine learning library called sci-kit learn to create a document-term matrix and identify similar headlines.
Sentiment analysis tool
This prototype assigns a sentiment score to a text based on how positive or negative it is. As a pre-processing step, it identifies the predominant language of a user inputted text and, if needed, translates it to English using the googletrans python library. The text is then analyzed using VADER-Sentiment-Analysis, which looks for various text features such as negations, exclamation marks, all caps, degree modifiers, sentiment-laden slang, and UTF-8 encoded emojis. A sentiment score is then assigned at both the text and sentence level. Finally, sentences are color-coded to visually convey the sentiment score. The technology could be used to identify positive and negative news comments and product reviews.
Automatic image tagger
This prototype automatically tags images using a deep learning image model. When a user uploads an image, the image is pre-processed (e.g., resized and cropped) using PyTorch, a popular python-based machine learning library often used in computer vision projects. The image is then run against a computer vision model. Currently, the only available option is ResNet-50, a general computer vision model based on ImageNet. The prototype then returns the top 5 labels along with their confidence scores. The technology could be used within a content hub to make images searchable.
This prototype indexes news content on the EOSIO blockchain. It takes a news story and converts it to a blockchain and news media friendly format derived from ninjs (News in JSON). The resulting JSON object is then published as part of a blockchain transaction on the Jungle Testnet (a popular EOSIO testnet). If desired, the content can be encrypted using a SHA-256 hashing algorithm before pushing the transaction. The technology has a number of potential use cases, including to protect copyright, increase journalistic transparency, and safeguard content from data loss and censorship.
Outside of these four prototypes, PUBLISHlab is also working on using transfer learning to build more accurate computer vision models, news automation, and news summarization.