Thursday, March 26, 2015

Europe's Digital Single Market must be multilingual!

Vice-President A. Ansip and R. Kalniņš 
More than 3000 members and stakeholders of Europe's Language Community signed an Open Letter to the European Commission: "Europe's Digital single market must be multilingual". On the occasion of an official lunch in Riga on 26 March 2015, Rihards Kalniņš, Marketing Communications Manager at TILDE handed the Open Letter with  more than 3000 printed signatures to Andrus Ansip, Vice-President of the European Commission.  The latter expressed his awareness of the multilingual challenge in building  the Digital Single Market, was impressed by the number of signatures and asked for further input from Europe's Language Community.
Join the initiative and sign the Open Letter!

Sunday, March 15, 2015

Do we really want smart personal assistants?

During the ROCKIT Roadmap Conference in February, I was designated to take notes and summarize the results of the session for scenario 2, “smart personal assistants.” That's harder than it sounds! I think the most important thing we learned was that whether European businesses will be successful in this emerging technology area is highly dependent on the business models they adopt and the culture that develops in Europe around them.

When people think about smart personal assistants, they immediately assume that the goal is to build a rival to things like Siri – an engine that can assist any user on a wide range of topics. This is actually a no-go strategy for European business for two reasons.

The first is that it's not the European way. This kind of generic personal assistant makes a great sales vehicle for global giants, but is poor at delivering what the customer actually wants. By its very nature, it's really more of a smart *im*personal agent on commission. Historically, European businesses have been based on a strong business-to-business orientation. They understand their local contexts and verticals well, and provide better end-user products and services because of it. Their offerings are niche – sometimes so niche they're just ways of getting around the limitations in some technology one of the giants has been pushing – but there's no shame in that. It may not be the way to fast growth and glitzy headlines, but I for one would rather provide genuinely useful products and services. Couple that with the fact that it's what we do well and there's really no question about the right approach.

The second reason why it's a bad strategy for our community is that even if we wanted too, we couldn't compete for one simple reason – data. These systems work because the giants fielded them among enthusiasts when they were just “good enough”, and improved them massively with larger and larger amounts of data as their use grew. Yes, we need to do more to share data among ourselves, and yes, we may well have better machine learning – but they have first starter advantage. The group consensus was that it would take us ten years before we were ready to start thinking about fielding a rival system, by which time the world will look completely different.

Once we recognize that this is the shape of the game that we're in, it tells us much more about what kind of community infrastructure and cooperation we need to create in order to support each other and do better all round. That will be the subject of my next blog post.

Jean Carletta University of Edinburgh



We invite stakeholders of all kinds to comment on these views, whether or not they were at the Roadmap Conference - are they right? Please use the comment facility below.

Friday, February 13, 2015

A glance at the latest and most comprehensive Roadmap for Conversational Interaction Technologies

The CITIA Roadmap Conference, with an impressive line-up of speakers and panellists, is around the corner (24-25 Feb in Brussels). We are excited to announce that the first version of the ROCKIT strategic roadmap, which will drive many of the sessions of the upcoming Conference and used to set the priorities of CITIA, is now available to view online.

To main goal of the roadmap is to engage public or private research organisations, including SMEs, into a constructive discussion towards full exploitation of new conversational interaction technologies. The roadmap should enable our community to compare prominent use cases, products and services, science and engineering capabilities, as well as readiness, needs and timeframes for future R&D. Entrepreneurs and researchers can use it to focus their R&D, partnerships and related strategic efforts.

This first version of the roadmap is the result of an on-going (2-year) consultative process which in its first year alone involved over 100 experts who provided their input during five physical workshops organised in conjunction with major sector events. For the science/technology areas alone, some 1,000 inputs were captured during these workshops. All those inputs were clustered, filtered and linked together across several layers, including:
  • 10 societal Drivers & Constraints
  • 5 generic R&D Scenarios
  • 10 Product/Service Types, with the added value, a SWOT analysis and a 10 year timeline for each of them.
  • 8 Science/Technology Areas, with cluster and 10 year timelines for each of them.
  • 7 Resource Types

The graphical version as well as a presentation of the first version of the roadmap is easily accessible via http://tinyurl.com/ROCKIT-v1

Figure 1 Initial view showing five interrelated layers.

The initial view (depicted in Figure 1) shows the main interacting layers, including: Drivers & Constraints (the “why”); Scenarios; Product/Service Types (the “what”); Science/Technology Areas (the “how”); and Resources. By hovering over any item, one may choose to see either a) a short description of that item or b) the cross-layer relationships with other items.

Clicking on any of the Product/Service Types or Science/Technology Areas allows drilling down to detailed information such as a SWOT analysis of a Product/Service Type (Figure 2), or the foreseen 10 year timeline for a particular Science/Technology Area (Figure 3).

Figure 2 The SWOT analysis of Generic Personal Assistants, under Product/Service Types.


Figure 3 The 10 year timeline of the Natural Language Interpretation & Generation, under Science/Technology Areas.

We now very much encourage discussions around the roadmap’s contents before, during or after the Conference. We are particularly interested in:
  • Verifying relationships between items
  • Establishing their readiness levels as well as
  • Measuring their expected social and economic impact.

If you want to be involved, just create a free account and visit the roadmap to add your comments and cast your votes.



Article contributed by Costis Kompis, Vodera

Costis Kompis is the managing partner of Vodera, a company that supports private and public organisations align their R&D activities, develop innovation strategies for emerging technologies and design new business models to capture market opportunities.

Tuesday, December 9, 2014

LT-Accelerate: A Major Text Analytics-Meets-Multilingual Talkfest in Brussels

LT-Innovate and Alta Plana, headed by text analytics community builder Seth Grimes, combined forces last week (4 - 5 Dec) to launch the first LT-Accelerate conference in Brussels. This attracted a broad range of analytics technologists and user companies to an in-depth conversation about the LT contribution to business opportunities in text and speech analytics, with a discreet emphasis on the multilingual European context.

For those who missed it, there’s a handy summary at Storify. The presentations and pictures of the event are on the event's website. You may also want to check out the @LTAccelerate Twitter channel and hashtag #LTA14.

Basic text analytics is now maturing, with a growing stable of tech companies offering APIs to their NP solutions or dashboards that help user companies make sense of their “unstructured” data. At the same time, the relevance of the binary sentiment analysis models is starting to reach its limits for many users, who henceforth need more insight into how human emotions and intentions are expressed linguistically in the decision-making process. And multilingual text data modelling continues to raise barriers for global players, either due to the inherent structure of languages or to a lack of reach.

Here are four takeaways that we shall explore further in future blogs:

The future of market research: one of the biggest users of text analytics is the Market Research industry (worth $61.45B globally) and currently morphing into a digital player by adopting new technologies of automated listening, mining and engaging. For MR, the future will involve among other phenomena a billion new Chinese tourists (think language, travel tech, tourist infrastructures, and communication generally) – an extraordinary opportunity for almost any business in Europe if they know how to address the challenge.
                                                                                                                                                
Getting Down to Semantics: The market opportunity for text analytics covers at least two very different families of data: business-generated text such as that provided by publishers, and every other customer-facing enterprise. And user-generated data, often sourced in social media and customer reviews.
Havas Media showed how they can now classify customer generated data into one of the four stages of the “customer decision journey” on the basis of linguistic cues, with a success rate of some 74%. This allows them to automate the classification of short consumer messages and thereby vitally inform retailers and others about the crucial decision process those customers go through.
On the business content production side, Elsevier demonstrated how they use proprietary semantic technology – known as a Fingerprint Engine - to enrich existing text from authors, patents, and increasingly foreign language data so that specialised STM searches can be apply concepts rather than words alone.  This can enable a science author, for example, to find exactly the right journal that matches his research specialty.
We shall come back in a later blog to other semantic solutions in this space.  

Generation A to Z: The most unexpected data point in the whole event might well have been the claim by Robert Dale (Arria) that “by 2020, more texts in the world will be produced by machine than by humans.” Three European content generation tech suppliers (Data2Content, Yseop, and Arria) addressed the apparently massive market for automatically generating content from data, rather than about data. The challenges here are to understand data as information (which is where semantics comes in) and then to turn that information into a narrative that tells a story. In a sense, therefore, what natural language generation will be able to do is take the results of data analytics – i.e. data – and use language technology solutions to turn it into content that humans (and also machines presumably) want to read. Watch this space!

Relevant Data is not always Big: Although we were treated to some large numerical data points during the conference - IBM recorded 53 million social media posts during the 64 games of the Brazil World Cup this year and a 50-agent speech contact centre can generate about 11Tb of voice recordings a year – the oil company Total told a story about small data. It highlighted the extremely practical virtues of smart search, analysis and presentation of smallish sets of highly relevant data from a corpus on oil-well safety-standard issues. This showed how you can mine value from text data to optimise knowledge sharing within a business. And it demonstrated that in many cases business clients will want to tailor the solution to their own needs. A useful lesson in how to market certain kinds of text analytics solutions!

Friday, November 21, 2014

Quick Q&A: On the Earned Media Value of a Brand’s Social Activities

Earned, paid, and owned media are distinct species. If you haven’t laid out cash for a mention of your brand, product, or personnel in a media outlet, whether online or social, you’re deemed to have earned the coverage. (Take “earned” with a grain of salt. You may have laid out big bucks for a publicist or efforts to build your brand’s visibility.) If you’ve bought the coverage — advertising, for instance — that’s paid. And if it’s your outlet, then that media is owned.

Whether media is earned, paid, or owned, you want to measure the extent of attention and the effectiveness of your message. The effort can get quite involved, when multiple channels and multiple exposures are in the mix. The get a precise picture, you have to engage in attribution modeling. When social platforms come into play, the effort can be substantial.

General social business challenges, and technical responses, are central topics at LT-Accelerate, a unique European conference, taking place December 4-5, 2014 in Brussels. We’ll have Roland Fiege of IPG Mediabrands speaking, on methodologies and tools for measuring the earned value of brand social-media activity. If this topic interests you as well, you’ll want to learn more. A quick Q&A I recently conducted with Roland is a start, then I hope you’ll join us in Brussels. First a brief bio –

Roland Fiege is head of social strategy at Mediabrands Social, home of Performly. In his spare time, he is working on a PhD project researching methodologies for measuring the value add of marketing on Facebook and Twitter. And next, -

Our interview with Roland Fiege

Q1: The topic of this Q&A is social media analytics. What’s your personal SMA background and your current work role?
Roland Fiege: My personal SMA background started with consulting projects evaluating social media listening systems back in 2009. In 2010-11, I was part of an international team at US technology company MicroStrategy that developed a solution that analyzed the social graphs of Facebook users to help brands to understand the interests and affinities of their “fans” better.

In my current work role, we analyze user interactions responding to brand messages on social media channels and have developed a model that attributes an monetary “earned media value” to these interactions. This allows brands to quantify and valuate the outcome of their social media investments.

In my current work role, we analyze user interactions responding to brand messages on social media channels and have developed a model that attributes an monetary “earned media value” to these interactions. This allows brands to quantify and valuate the outcome of their social media investments.
Q2: What are key technical and business goals of the analyses you’re involved in?
Roland Fiege: The technical challenges are to keep the solution up to date with ongoing API changes by the most popular social networks and how to loop back “real time” bidding price benchmarks into our systems (vs. a static benchmark). Another challenge is to meet the EU data privacy standards that enterprises,German especially, try to comply with.

Business-wise, the challenge is to establish a common understanding how to attribute and valuate user interactions.

Business-wise, the challenge is to establish a common understanding how to attribute and valuate user interactions.
Q3: And what particular analytics approaches or technologies do you favor, whether for text, network, geospatial, behavioral, or other analyses?
Roland Fiege: We basically gave up on automated text analysis when it comes to sentiment. It never worked in Europe with all the different languages, dialects, irony etc. There was too much manual work involved that clients were not willing to pay for.

Currently we concentrate on the quantification for user engagement and its financial valuation.

Q4: To what extent do you get into sentiment and subjective information?
Roland Fiege: Our experience is that if users like, share, and comment on brand content, it mostly is positive or neutral sentiment involved. Contrary to this, most user posts on brand channels are negative and in correlation with negative customer experiences. Since we measure the monetary value of brand communication, we only measure fans/follower interactions on brand content.
Q5: How do you recommend dealing with high-volume, high-velocity, diverse social postings — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?
Roland Fiege: We do not only rely on the APIs that Twitter, Facebook and YouTube (Google) provide but also user other (fire hose) data providers to get the most complete picture/dataset, also for retrospective analysis.
Q6: Could you provide an example (or two) that illustrates really well what you’ve been able to accomplish via SMA, that demonstrate strong ROI?
Roland Fiege: What we accomplish: Clients manage to optimize their content strategies in near real time, can compare the performance of their content (agencies) in different regions and countries, and can identify savings potential in the millions. It is the first time brands can calculate the total cost of ownership of their social media channels and have a clear Input vs. Outcome result all condensed into one KPI: Money.
Q7: I’m glad you’ll be speaking at LT-Accelerate. Please tell me about your presentation, briefly: What attendees will learn.
Roland Fiege: In this talk you will learn about the latest methodologies and tools to measure the Earned Media value of a brand’s activities on Facebook, Twitter and YouTube in hard currency.
Q8: Finally, do you have recommendations to share, regarding choice of data sources, metrics, analytical methods, and visualizations, in order to best align with desired business outcome?
I will share those in my presentation in as much detail as possible.
Thank you, Roland, for your responses. I’m looking forward to hearing more, at LT-Accelerate in Brussels.




This Interview has been done by Mr. Seth Grimes, the leading industry analyst covering text analytics, sentiment analysis, and analysis on the confluence of structured and unstructured data sources and founder of Alta Plana Corp.

Wednesday, November 19, 2014

How Havas Media Views Consumer & Market Analytics

Inés Campanella, Havas Media
Our thesis: Language technologies — text, speech, and social analytics — natural language processing and semantic analysis – are the key to understanding consumer, market, and public voices. Apply them to extract the full measure of business value from social and online media, customer interactions and other enterprise data, scientific and financial information, and a spectrum of other sources. The insight you’ll gain means competitive edge, whatever your organization’s mission.

Insight, via business (and research and government) application of language technologies, is the central topic for LT-Accelerate, a new conference that takes place December 4-5 in Brussels.

I recently interviewed a number of LT-Accelerate speakers. My questions broadly cover the topics they’ll be addressing in their conference presentations. This article relays my Q&A with speaker Inés Campanella of Havas Media Group and her colleague Óscar Muñoz-García. I’ll provide a bit of background and short bios and then we’ll get directly to the questions and responses.

Our interview with Inés Campanella and Oscar Muñoz-García


Q1: The topic of this Q&A is consumer and market insight. What’s your personal background and your current work role, as they relate to these domains?
Inés Campanella: I hold a B.A in Sociology and a M.A in Research Methods. Through my work and studies, I specialized in the field of Sociology of Communication and Information Society. Given my background, I’m very keen on new communication models and behavior patterns mediated by new media (i.e., social networks and other social media) and how we can profit from this amazing stream of behavioral data to increase our knowledge of social behavior. 
At Havas Media I work as a researcher within the Global Corporate Development Team. My role involves integrating scholarly research and social theory into market research, designing a conceptual framework for insights into online consumer behavior; with a special emphasis in buzz monitoring. One of my main responsibilities is help to build qualitative, business-savvy content classifications to be used in the development of novel content analytics tools. My day-to-day tasks also involve working in practical, hands-on online market research analysis. This twofold approach allows me, and the team I work in, to come up with an innovative and yet pragmatic approach regarding what technology we are able to develop and what technical features we need to improve to meet our clients’ real needs.
Q2: What roles do you see for text and social analyses, as part of comprehensive insight analytics, in understanding and aggregating market voices?
Inés Campanella: Regularly listening to consumers is a task that marketers must undertake in order to know their audience, detect how people feel about them, and cater to their needs and desires. This being said, the new shopping scenario with people massively sharing their thoughts online and performing regular online research about product and brands calls for an assessment of the techniques traditionally used in market research. There are many advantages to this. In comparison with traditional quantitative techniques such as questionnaires, the collection of opinions extracted from social media sources means less intrusion since it enables the gathering of spontaneous perceptions of consumers, without introducing any apparent bias. In addition, the possibility of doing this in real time poses a clear advantage over other techniques based on retrospective data. Overall, this allows for a more efficient and complex business decision making.


So text analysis is and will increasingly be key to market research. Nevertheless, issues such as online privacy, anonymization and the degree of representativeness and objectiveness this data holds in comparison with other methods must be taken into account. We are only beginning to understand how we can combine these approaches in a solid, law-abiding methodological toolbox.
So text analysis is and will increasingly be key to market research. Nevertheless, issues such as online privacy, anonymization and the degree of representativeness and objectiveness this data holds in comparison with other methods must be taken into account. We are only beginning to understand how we can combine these approaches in a solid, law-abiding methodological toolbox.
Q3: Are there particular tools or methods you favor? How do you ensure business-outcome alignment?
Oscar Muñoz: There are many tools for measuring consumer insights in online paid and owned media that are reaching a significant degree of maturity, for instance, Programmatic Advertising and Web Analytics platforms for paid and owned respectively. However, regarding tools for performing consumer analytics in earned media, there is a long road that still lies ahead for offering results that can be easily activated in communication strategies.

Content classification is needed to enable meaningful KPIs (key performance indicators). At Havas Media, we are working on sentiment KPIs that go beyond polarity identification (e.g., classification of emotions expressed by users towards brands and products), on consumer communities research studies via big graph analysis techniques, and on measuring the influence of ad campaigns over word-of-mouth through the analysis of correlations between advertising spent, spots’ audience, and social buzz.

Content classification is needed to enable meaningful KPIs (key performance indicators). At Havas Media, we are working on sentiment KPIs that go beyond polarity identification (e.g., classification of emotions expressed by users towards brands and products), on consumer communities research studies via big graph analysis techniques, and on measuring the influence of ad campaigns over word-of-mouth through the analysis of correlations between advertising spent, spots’ audience, and social buzz.
Q4: A number of industry analysts and solution providers talk about omni-channel analytics and unified customer experience. Do you have any thoughts to share on working across the variety of interaction channels?
Inés Campanella: We live in a digitalized world and this means that we no longer find consumers in one location, environment or channel but, rather, in an ever-increasing variety of them. Traditional customer journeys are no longer valid and, thus, new strategies to engage with consumers — and avoid looking redundant to their eyes — are very much needed.

We have witnessed that, while companies struggle to connect their marketing strategies, they often lack a tool-supported holistic approach that ensures effective multi-channel and multi-device media strategies. Our ultimate goal at Havas Media is to integrate all data sources in order to track consumers across online and offline touch points, gathering information about them with the aim of performing real time automation of communication processes. An example: serving personalized, timely online ads, push messages, and e-mail recommendations. This is completely indispensable if we wish to address consumers in an effective way.

We have witnessed that, while companies struggle to connect their marketing strategies, they often lack a tool-supported holistic approach that ensures effective multi-channel and multi-device media strategies. Our ultimate goal at Havas Media is to integrate all data sources in order to track consumers across online and offline touch points, gathering information about them with the aim of performing real time automation of communication processes. An example: serving personalized, timely online ads, push messages, and e-mail recommendations. This is completely indispensable if we wish to address consumers in an effective way.
Q5: To what extent does your work involve sentiment and subjective information?
Inés Campanella: To a very large extent. Either when I’m directly dealing with data from social media listening projects or when we’re devising new business coding frames, we’re always trying to elucidate ways to leverage this source of subjective information and make it actionable.

On the other hand, carrying out accurate [sentiment] polarity analysis is essential, but we believe it is equally important to achieve a good classification and detection of recurrent conversation topics between users (e.g., regarding product features). Our deployment of content analytics tries to give answer to all these questions. Otherwise, we would be missing half the story.

On the other hand, carrying out accurate [sentiment] polarity analysis is essential, but we believe it is equally important to achieve a good classification and detection of recurrent conversation topics between users (e.g., regarding product features). Our deployment of content analytics tries to give answer to all these questions. Otherwise, we would be missing half the story.
Q6: How do you recommend dealing with high-volume, high-velocity, diverse data — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?
Oscar Muñoz: We deal with volume and velocity by leveraging Big Data processing platforms like Hadoop and the related ecosystem (e.g., HBASE, HIVE, Spark, etc.). To tackle diverse data, we spent a significant part of our computing resources on ETL (extract, transform, load) processes for normalizing, integrating, aggregating, and summarizing data from multiple social media channels (Twitter, Facebook, blogs, forums, etc.) according to a unique schema of linked data about content, users, and related metadata. 
Regarding accuracy, our goal is to develop natural language processing (NLP) algorithms that are as precise as possible, to obtain confidence levels similar to other techniques like opinion polls. Unfortunately, this goal cannot be achieved easily. We combine machine learning and deep linguistic analysis techniques in order to find fair balances of precision and recall, but there is still a lot of work to be done.
Regarding accuracy, our goal is to develop natural language processing (NLP) algorithms that are as precise as possible, to obtain confidence levels similar to other techniques like opinion polls. Unfortunately, this goal cannot be achieved easily. We combine machine learning and deep linguistic analysis techniques in order to find fair balances of precision and recall, but there is still a lot of work to be done.
Q7: Could you provide an example (or two) that illustrates really well what your organization and clients have been able to accomplish via analytics that demonstrate strong ROI?
Inés Campanella: We’ve developed business indicators that allow us to better code and interpret social media listening data, namely marketing mix indicators and consumer decision journey stages. Ultimately, this has a very positive impact on ROI. Let me explain this in greater detail. 
To monitor in real time and accordingly react to the experiences that customers are sharing, our clients must know the purchase stages in which those consumers are gained and lost, in order to refine touch points, impact consumers at the right time, and achieve the desired result (that is, a transaction). Also, uncovering the exact content of the dialogues that customers are having lets marketers and advertisers keep better track of consumers’ mindsets. We’ve found that the combination of these two categories provides very valuable information for a better positioning of the brand or organization in the market and, therefore, for an improved return on advertising efforts.
To monitor in real time and accordingly react to the experiences that customers are sharing, our clients must know the purchase stages in which those consumers are gained and lost, in order to refine touch points, impact consumers at the right time, and achieve the desired result (that is, a transaction). Also, uncovering the exact content of the dialogues that customers are having lets marketers and advertisers keep better track of consumers’ mindsets. We’ve found that the combination of these two categories provides very valuable information for a better positioning of the brand or organization in the market and, therefore, for an improved return on advertising efforts.
Q8: I’m glad you’ll be speaking at LT-Accelerate. Your talk is titled “Understand Consumers: Mindset, Intentions, and Needs.” Would you please describe your presentation briefly: What will attendees learn?
Inés Campanella: I’m also glad I’ll take part in LT-Accelerate. I will introduce the audience to Havas Media Group current needs, challenges, and practices regarding content analytics. Specifically, I’ll comment on the research we have carried out regarding innovative classification of user generated content (UGC) to improve social media buzz monitoring. In short, I’ll explain the business need for these kinds of classifier and how we can leverage and combine them with other market techniques and insights to improve our understanding of consumers’ mindsets and habits.



This Interview has been done by Mr. Seth Grimes, the leading industry analyst covering text analytics, sentiment analysis, and analysis on the confluence of structured and unstructured data sources and founder of Alta Plana Corp.


Tuesday, November 18, 2014

From Social Sources to Customer Value: Synthesio’s Approach


Text analytics is an enabling technology for deep social media understanding. We apply natural language processing (NLP) and data analysis and visualization techniques in an effort to make sense of the diversity of social postings. The social intelligence that results advances customer engagement and informs efforts to meet marketing, customer experience, product management, and reputation management needs.

I interviewed Pedro Cardoso of social intelligence leader Synthesio as part of preparation for December’s LT-Accelerate conference. Pedro will be speaking on language morphology (forms) in sentiment analysis. That’s a fairly technical topic, reflecting Pedro’s role as text analytics director at Synthesio, but one that will help business attendees understand the ins-and-outs of attitudes, opinions, and emotions in social and other text sources.

Pedro’s background: He earned an engineering degree in electronics and control systems and a masters in speech processing. His career path started in Portugal, as a research engineer, followed by 4 years in Japan and 5 years in France. For the majority of this time, he worked on speech processing, mostly relying on machine learning for acoustic and language modeling. For the last 2 years, Pedro has been working on natural language processing at Synthesio in Paris.

Our interview with Pedro Cardoso


Q1: The topic of this Q&A is social media analytics. What’s your personal SMA background and your current work role?
Pedro Cardoso: My background is in machine learning applied to language technology. I started in development of speech recognition systems — language and acoustic statistical models. The focus was not on social media analysis (SMA), even if over the years I did some call-center development, including tests on sentiment analysis in voice. Over the last two and half years, ever since I joined Synthesio, I have been working full-time on SMA. 
Currently I am responsible for NLP and text analytics development at Synthesio. Our objective is to create algorithms that help process and analyse social data collected by Synthesio, so that it can easily understood and exploited by our customers. This work includes data visualisation, document topic classification, and sentiment analysis.
Q2: What are key technical and business goals of the analyses you’re involved in?
Pedro Cardoso: Business drives technology, and customers needs drive business. 
As mentioned above, our objective in the text analytics group is to find ways to structure and present information from social media sources in a simple way that customers can understand and get value from it. Our focus is on text. We classify and summarize it with the goal of obtaining meaningful key performance indicators (KPIs) from large quantities of data, which would be impossible without technology. 
We also develop methods for detecting key influencers and deriving demographic information. This allows our customers to focus their searches on particular groups of social media users.
Q3: And what particular analytics approaches or technologies do you favor, whether for text, network, geospatial, behavioral, or other analyses?
Pedro Cardoso: If we focus on my work, I favor text and also study of network connections between online users. But if the question is what I believe to be the best technologies for SMA, that would have to be text also. Text is the medium, it is what customers use for communication. Network, geospatial, and other analytics are important, but mainly to focus our listening on a specific group. In the end, it is text, what SM users say, that counts. 
Recently there has been interest on image analysis. People share more and more pictures. Sharing the picture of a brand logo or a product carries a strong brand loyalty message. Still, we need better image processing techniques and to learn how to best use information from images, in particular how it combines with text, in case of comments. 
Social media allows us to focus on particular customers and groups, it allows us to have more personalized communications. In these cases, technologies such as demographic analysis and group detection gain favor, but discussing further, we would be getting off-topic.
Q4: To what extent do you get into sentiment and subjective information?
Pedro Cardoso: Automatic sentiment analysis is a great part of what I do as text analytics director. Our team is responsible for the development of automatic sentiment analysis at Synthesio, and has developed internally current support for 15 languages offered as part of the product. 
Subjectivity is a very complicated subject, and one that I believe no one has managed to solve. To understand subjectivity, you need first to understand well the user and the context in which a message was written. After all, the real meaning is in the person’s mind. We are still not there, and it might take a long while to get there.
Q5: How do you recommend dealing with high-volume, high-velocity, diverse social postings — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?
Pedro Cardoso: We have developed data crawlers that ensure we can capture, enrich and standardize data from different sources worldwide should they come from largest social networks (Twitter, Facebook, Sina Weibo, VKontakte, etc.), mainstream media sources, and blogs or forums (thanks to a dedicated sourcing team of 5 people). This approach allows us to deal with several million social mentions each day and to provide for each of them a sentiment assessment, a global influence ranking (proprietary algorithm), and potential reach (another proprietary algorithm), on an ongoing basis and in near real time. It takes less than 2 minutes for a data to be crawled, parsed, enriched and pushed into client interfaces. Once structured with both metadata and enriched data, our clients can then access their dashboard. They can either work on global data volumes for main KPI tracking and trend analysis and/or on focused subsamples for deeper human qualitative analysis.
Q6: Could you provide an example (or two) that illustrates really well what Sythesio’s customers been able to accomplish via SMA, that demonstrate strong ROI?
Pedro Cardoso: Sure. One of our clients in the automotive industry has achieved, through deep analysis of first-customer feedback in European forums, identification of key barriers when it comes to acquiring an electric car. Based on the lessons, they had the ability to create a far more efficient digital and social media campaign. ROI was there for reducing costs before the campaign both in terms of message crafting and media planning. ROI was there after the campaign, which drove far more traffic to the Web site, and to dealerships for test drives, than previous efforts. 
Another example we can give is a telco company that uses Synthesio for both listening and engaging directly with its clients on social networks, regarding client questions and complaints. By defining a precise listening scope and by clustering, combined with precise workflows for answer validation and publication, the client was able to measure ROI based on average answer time for any given question. By socializing answers to most frequent topics they also built up a C to C advice platform, which allows top users to directly address other customers questions. ROI is also achieved via fewer inbound calls to the call center.
Q7: Do you have recommendations to share, regarding choice of data sources, metrics, analytical methods, and visualizations, in order to best align with desired business outcome?
Pedro Cardoso: At Synthesio we hold two key principles when it comes to social data and metrics. 
  • We believe social analytics and intelligence have to be global. We have sources covering more than 200 countries, networks crawled natively in more than 50 languages, etc. 
  • And they have to be simple. We built business oriented metrics, comparable KPIs, and customizable interfaces to make sure that every single client within a company (from PR to marketing, from CRM to sales) can access the right data at the right moment.
Furthermore we know that social analytics can’t be envisaged as another data silo. That’s why we pay so much attention to openness and interconnections with other digital marketing tools (such as consumer review platforms like Bazaarvoice, owned communities platforms like Lithium, social marketing platforms like Spredfast, etc.), CRM (Salesforce.com, Microsoft Dynamics, etc.), or BI (IBM, etc.) tools used by our clients. Our open API helps them to both push data to such tools but also integrate data from other sources to get a 360° view of customer feedback, for instance. 
Last recommendation we would like to share is “Don’t get too focused on data: Next step is people.” To better measure ROI, our clients have to go back to where it all began: Business is conducted by people and not by a data set. Being customer centric for better targeting, better personalization of messages, and better understanding of the brand relationship is what guides all of our present and future developments. Even though our roadmap is our best kept secret, be prepared to see more demographic profiling, audience targeting tools, and sales oriented measurement and anticipation metrics.
Q8: I’m glad you’ll be speaking at LT-Accelerate. Your topic is fairly technical — exploiting languages’ morphology for automatic sentiment analysis — noting that we do have a range of presentations on the program. Would you please tell me about your presentation, briefly: What attendees will learn.
Pedro Cardoso: The first thing we need to understand is the definition of morphology. Morphology of a word defines its structure: the root, part-of-speech, gender, conjugation, etc. And this is the first giveaway of the presentation. 
Continuing, I will show how the use of morphological information of words helped us at Synthesio in building sentiment analysis, in particular for less represented languages, those that offer less labeled [training] data. Also, it is an important part of the system for agglutinative languages, whose vocabulary is theoretically close to infinite.
That wraps up this interview. I’m looking forward to Pedro Cardoso’s LT-Accelerate presentation. If you’re intrigued by what you read here, please do visit the conference Web site to learn more. And I hope you’ll join us 4-5 December 2014 in Brussels.





This Interview has been done by Mr. Seth Grimes, the leading industry analyst covering text analytics, sentiment analysis, and analysis on the confluence of structured and unstructured data sources and founder of Alta Plana Corp.