Smart Content: The Content Analytics Conference

Full Agenda

The Smart Content program for Tuesday, October 19, 2010 is described in the agenda that follows. Click here for an outline agenda without talk descriptions. The program is complete.

8:00 am-9:00 am Registration & Coffee
Morning Session
9:00 am-9:20 am Chair's Welcome
An introduction to Smart Content: the business challenge, the market, solutions, and the conference.
Seth Grimes, Alta Plana Corporation
9:20 am-10:20 am Visionaries Panel
The business of Smart Content as described and discussed by executive visionaries representing the worlds of content management, analytics, and applications.
Moderator: John Blossom, President, Shore Communications Inc.
Natasha Fogel, EVP, Edelman StrategyOne
Michael Lavitt, the McGraw-Hill Companies
Mark Stefik, Research Fellow, Palo Alto Research Center (PARC)
10:20 am-10:55 am What Business Innovators Need to Know about Content Analytics
Big picture issues for content analytics, Web 3.0, and semantic search: An introduction to the technologies as related to the smart content business problem, defined as creating findable, meaningful, and reusable content.
Jeff Fried, CTO, BA-Insight
10:55 am-11:15 am Break
Lightning Talks
11:15 am-12:45 am Lightning Talks
Brooke Aker, CEO, Expert System USA
We will show how semantic processing supports the explicit understanding (e.g. entities, categories , events) and implicit understanding (e.g. emotions, behaviors, motivations) necessary for superior online advertising content matching. Contextual Advertising is a misnomer since it runs on keyword matching only. Yet it is the predominate form of content to advertising matching. More recently semantic processing and matching of content to ads on explicit dimensions (e.g. entities, categories , events) has been shown to be a superior approach. In this talk we will show where the future of this semantic advertising approach is headed by adding the implicit dimensions (e.g. emotions, behaviors, motivations) that can be understood during the matching process.

Marty Betz, FirstRain
How new technology is enabling Intelligent Business Search from the Web
The talk will cover the new technology ground being broken to make the Web a relevant, actionable database for business people. The speaker will cover how business intelligence style results and reports can be created for investors and sales and marketing people by leveraging new advances in semantic search, pattern and fact detection and anomaly detection on broad, diverse web content.

Charles-Olivier Simard, Product Manager, Semantic Technologies, Open Text

Steve Kearns, Basis Technology
This lightning talk will outline the Odyssey Analytics Platform, a solutions platform from Basis Technology that centralizes the identification, analysis, indexing, and querying of unstructured information, in more than twenty languages. We will describe some of the challenges that organizations face when building and using analytics technology on a global scale -- in terms of both scalability and language -- and how these challenges can be overcome using the right mix of technology and system architecture.

David Koppel, Co-founder, Subtext3
Extracting demographic information from free text

Sentiment analysis is being used to find out what the blogosphere thinks about various products and subjects. This information would be far more useful if we could associate demographic data with these sentiments. Instead of saying that 20% of bloggers like Vodka, we could say 60% of men between 35 and 50 like Vodka. Subtext3 can provide that data based on 15 years of machine learning experience geared towards identifying gender and age based on analyzing writing samples.

Guillaume Mazieres, EVP, TEMIS
Smarter Content: the Publishers Perspective

Digital content production and delivery have seen a period of challenges and opportunities in which Publishers seek new ways and means for growth and competitive advantage.

In this context, Content Enrichment has emerged as the key process enabling Publishers to make content more targeted and compelling for audiences, opening avenues for increased content monetization with new products, services and channels.

During this session, TEMIS will share their insights into the forces that have made Content Enrichment a mainstream technology in professional Publishing. You will discover how TEMIS' solution Luxid powers efficient Content Enrichment and you will explore how several Publishers use the Luxid Content Enrichment Platform to drive breakthrough knowledge discovery projects.

David Milward, Linguamatics
Transforming Social Media Content to Structured, Actionable Data using I2E
Given the influence and growth of social media, it is increasingly important to be able to automatically extract, analyse and synthesize information of interest. This talk will focus on Twitter and show how semantic technology based on Natural Language Processing can filter out noise, and transform Tweets into structured data, allowing identification of key populations and discovery of sentiment, provide early warnings, and establish influence networks. The talk will provide a series of case studies from business, politics and health, including who uses which products or service, what do they think of them, where are they used, and what is the availability.

Lee Phillips, VP Product Marketing, Acquire Media
Real-time Entity Extraction, Categorization, Sentiment Scoring, and Human Review: Synergy

Adding intelligence to content requires a complex trade-off of completeness, accuracy, and latency. Processing digital business news (about 400,000 distinct articles per business day), NewsEdge navigates the trade-off curve for multiple audiences: black-box robots, news-driven traders, momentum-driven traders, researches, archivists, and publishers -- fueling instant as well as retrospective analytics. The key is to iterate, since each process in the pipeline -- Extract, Categorize, Score, Review -- borrows strength for the others and we go around the track multiple times. Stories hop off the bus at different latency stops. At the end of the line we have perfection; the trick is to be sure all the intermediate stops yield useful results for their intended audiences. We will discuss techniques like pre-classification, FLASH (Fast Linear Array Search of Headlines), taxonomy tree trimming, story shortening, and other insider secrets.

Rashmi Vittal, Enterprise Content Management, IBM
Leverage the power of content analytics

Today leading organizations need to use the wealth of information combined with the power of analytics to reach better, faster decisions and actions, optimized processes, and more predictable outcomes. However, organizations whose performance depends on large volumes of unstructured and semi-structured information have no ability to gain access, control and insight from that information to make better business decisions or gain a business advantage. Hear how only IBM can leverage the power of content analytics to:

  • Interactively discover content for new business insight
  • Interactively assess for content preservation and decommissioning to reduce storage costs and risk exposure
  • Provide powerful solution modeling and support for advanced classification tools for more accurate, deeper insight
  • Deliver of key insights to other systems, users, and applications

Mark Stefik, Palo Alto Research Center (PARC)
The Role of Content Analytics in Media Curation

News companies and publishers act as curators for their readers. In the past, they curated primarily their own content. Now in the age of social media, they are struggling to apply manual curation techniques to the flood of user generated news and commentary, especially along the Long Tail. Content analytics technologies can help automate the curation process to meet the scale and speed of the Web.

However in order to be effective, content analytics must be intelligent enough to model and amplify the judgments of human curators. At PARC, we have developed a prototype content classification and targeting technology called Kiffets that combines the perspective of human curators with artificial intelligence. Curators specify good topics, sources, and examples to reflect a point of view for the curated collection. The system learns curator intent from the examples and creates models for classification that are more sophisticated than keyword or entity-based patterns. The system further finds relationships between topics, articles, and collections automatically and makes these connections visible to add context and increase reader engagement.

In this session, we will discuss how publishers can use content analytics technologies to extend the reach of their editors to curate Long Tail content efficiently, and present the PARC Kiffets technology along with a demonstration.

Izzet Agoren, Business Development Director, Crystal Semantics
Semantic Advertising Technology: Using semantics to target placements & protect brands

Izzet Agoren will give an overview of the power and impact of semantics in advertising online. His presentation will include discussion of the following components:

  • Linguistics
  • Taxonomy development
  • Indexation
  • Technology application
  • Word sense disambiguation

Serafim Scandalos, Sales & Marketing Executive, Neurolingo LP

We will give an overview of the Mnemosyne Framework, a solutions platform by Neurolingo, which collects, analyzes, creates semantic annotations, distills and stores information from oversized unstructured data collections.

We will focus on the high-level architecture, the underlying technology and the key features and will also present some insights from indicative projects as well as our approach to the field of Content Analytics.

Rohini K. Srihari, Ph.D., CEO, Founder and Chief Scientist, Janya
Save money, time AND get better results? Why machine learning is a big part of the picture for content analytics.

In order to get the best results, content analytics systems typically need to be customized to handle the unique or localized language for a particular knowledge domain such as medical records or financial documents. Often this requires the laborious use of experts to analyze the language and create ?rules? or lexicons of terms for these domains.

All of this takes time and money. However, machine learning techniques combined with automated processes for performance evaluation can significantly speed up this process and provide better results than rule or lexicon based systems alone. In this talk, we will demonstrate the role of semi-supervised machine learning in content analytics and discuss case studies in government and business that have used this technology.

Leonoor van der Beek, Research Manager, Q-Go
Continuously tap into the voice of the customer
Customers are continuously telling organizations what information they are looking for, which products they would like to buy, and what actions they would like to complete, by entering questions and search queries on the organization's website. A company that understands what customers are asking is able to answer questions on line, optimize web content and sell products that fit the customer's needs. Q-go's self-service solution is powered by semantic search technology, that not only enables you to hear the voice of your customer, but also understand what it's telling you. This talk will explain how to unlock the business value contained in online customer interaction and why understanding the voice of the customer requires a combination of smart algorithms, deep knowledge of your customer's language and loads of experience with actual user interactions. We will argue that understanding your customer is crucial for dealing with the growing number of mobile internet users, which expect immeadiate results and easily turn away if they don't find what they are looking for fast.
Anthony Vito, Software Engineer, TextWise
The Value of Semantic Discovery in CRM
Describes the theory of TextWise Semantic Signatures. Describes the functions, features, and use cases of the TextWise API in the general case of media, and the specific case of enterprise data. Demonstrates a Salesforce Application utilizing the TextWise API that is capable of automatically discovering related knowledge base articles to customer cases.
José Carlos González, CEO, DAEDALUS
Multilingual Web Mining Analytics

DAEDALUS has been providing solutions based on language technologies for the media, publishing and information services industries since 1998. Currently available solutions include:

  • Spell, grammar and style proofing of texts in English, Spanish, French and Italian
  • Automatic classification of texts/news according to standard schema (IPTC, in the media industry) or thesaurus (Eurovoc)
  • Automatic clustering
  • Plagiarism detection
  • Text filtering (anonymization, detection of inapropriate text)
  • Automatic information extraction from web sources
  • Semantic labeling
  • Fuzzy and semantic search
  • Sentiment and opinion analysis/mining
Current work includes multilingual semantic-based web mining analytics, where semantic analysis, including disambiguation, makes reference to entities and relations according to ontologies or thesauri (SUMO, DBpedia, WordNet)

Lunch & Networking
12:45 pm-1:45 pm Lunch & Networking
1:45 pm-2:20 pm Imagine a Nimble World: Challenging the Publishing Industry

Content needs to be free (like a bird, not a beer) -- free to be viewed across any platform at any time.

To survive in the digital age, publishers must find engaging ways to re-package their content as products and services with a distinct value to customers.

In this discussion, we'll addresses challenges facing the publishing industry today including if and how to monetize content, how publishers can successfully make the transition to the digital economy, add circulation, find new readership, increase ROI, deliver valuable content to a growing range of platforms and devices, and deepen audience engagement.

Content Strategist Rachel Lovinger recently authored Nimble (, a report which provides a thorough analysis of the digital publishing industry commissioned by Razorfish's Media & Entertainment Practice in partnership with information services company Semantic Universe.

We'll discuss paid content, and what success looks like for brands (think a value-add approach) as well as failure. She will also touch on new revenue streams that leverage the opportunities specifically offered by digital content, and how publishers can plan for all platforms.

Rachel Lovinger, Content Strategy Lead, Razorfish
2:20 pm-2:55 pm How Semantic Technology Will Enrich Our Lives: Scientific Research, Advertising and Everyday Search
Semantic technology applications are popping up everywhere join industry expert Darrell W. Gunter as he shares how semantic technology is improving scientific research, advertising and every day search.
Darrell W. Gunter, EVP/CMO, Collexis
2:55 pm-3:30 pm Content, Data and Humans: Putting Analytics into Action

We talk a lot about analytics, measurement and metrics online - but how do we turn all that data into tangible, successful changes in our content? How can analytics best inform our publishing decisions?

The bridge between analytics and content is a human investigation. We'll show you how to drive your analytics towards specific and well informed content changes by understanding the limitations of measurement, and how to mitigate those limitations by understanding your content and your audience from the inside out - through testing, auditing and due process (with a bit of creative inspiration thrown in).

We'll talk about how we've applied analytics data to make effective and beneficial publishing decisions in three very different scenarios, all from our consultancy work:

  1. A path through an iPhone purchase
  2. A deconstructed health insurance plan, and
  3. What analytics told us about our own blog
We will also step back and explore the granularity and discreteness of content and data, to help businesses of all kinds tell the right story to the right audience, both big and small.

3:30 pm-3:50 pm Break
Application Spotlights
3:50 pm-4:00 pm Semantics at Work: Uses and Benefits of Smart Content
4:00 pm-4:20 pm Corporate Reputation and Risk Management

Corporate reputation is a collective representation by stakeholder audiences of a firm?s past actions, results, and communications. Reputation encapsulates expectations by stakeholders about future behaviors of the corporation, and thus determines the relationships between the organization and its stakeholders. A strong reputation becomes a capital asset, a resource, a competitive advantage, and an intangible that the firm can manage to support organizational goals. Corporations that routinely monitor, measure, and evaluate their reputation, risk, and the drivers of reputation and risk among specific stakeholder audiences will be able to identify opportunities and threats, develop action plans, and support the achievement of desired business outcomes.

In a networked world, corporations must re-orient their listening in the competitive marketplace to be faster, smarter, and more stakeholder centric. This presentation examines how several Fortune 500 corporations (AT&T, Yahoo!, Monsanto, and Electronic Arts) are using (i) real-time monitoring of traditional and social media combined with (ii) text analytics, and (iii) brand reputation algorithms to guide strategy and tactics. Advanced content analytics ?content tagging and enrichment, sentiment analysis, brand tracking, measurement, stakeholder segmentation, and content management ? delivers a more holistic view of stakeholders in the competitive landscape.

By understanding, through content analytics, leading indicators of opinion trends towards a company, its brands, its products and services, and by understanding issues, topics, and market trends that can affect the corporation, companies can take steps to gain competitive advantage and protect themselves from risk.

David Geddes, Vice President, Research and Development, evolve24
4:20 pm-4:40 pm Analyst Tools, Competitive Analysis, and Market Intelligence

As little as five years ago, researchers struggled to find enough valuable nuggets of information on their corporate competitors, emerging technologies and market trends. Nowadays, the problem facing those same researchers is their constant struggle manage the flood of information, both valuable and irrelevant.

Today, for many of their research projects, analysts need to develop new tools and techniques to effectively and efficiently organize the data, and make sense of it. Software solutions are now available that enable not only new insights, but also new ways to display the results to the decision makers.

This presentation will discuss a number of case studies of analytical projects using new software tools to support corporate executives in developing their marketing and sales strategies.

Fred Wergeles, President, Fred Wergeles & Associates LLC
4:40 pm-5:00 pm Delivering Richer, Smarter, Targeted Content

Discover how the latest generation AVIATION WEEK information products have put semantic technologies at work to deliver smarter content to their readers and customers.

Serving over 1.2 million professionals in 185 countries, AVIATION WEEK is the largest information and services provider to the global commercial, defense, space and business aviation communities. AVIATION WEEK delivers news, analysis and intelligence through its magazines, newsletters and databases of company information for aerospace manufacturers, airlines and providers of maintenance and services.

Like many BtoB publishers, AVIATION WEEK was faced with the following daunting challenges:

  • Exponential Information Growth
  • Wide - and Growing - Array of Delivery Platforms
  • Fragmented Audiences
  • Lower Barriers to Entry
  • Shaky Advertising Model
  • Expectations of Free Content
To gain market share and enhance customer experience, AVIATION WEEK has deployed several content enrichment applications over the past year:
  • Automatically Linking tagged entities in articles to Organization and Program topic pages
  • Building on-the-fly special reports on specific industry events and topics
  • Identifying new organizations and people mentioned in news articles
  • Categorizing content against a taxonomy
  • Cross-pollinating its content by kinking it to related content from other McGraw-Hill publications and possibly beyond
  • Globally increasing the value proposition for AVIATION WEEK's content, products and services.
Live demonstrations and real world use cases of the above capabilities will be given throughout the talk.

Michael Lavitt, the McGraw-Hill Companies
5:00 pm-5:20 pm Actionability and ROI from Social Media Sentiment Analysis: Myth or Miracle
Few studies have established definitive links between social media sentiment analysis and behavioral outcomes. While many sophisticated systems exist that can aggregate social media content and assign sentiment polarity scores, correlating their use to demonstrable ROI, remains an elusive goal. This presentation will review 2 case studies from the life sciences and financial services industries, exploring the utility of social media sentiment analysis. The first case study examines the use of Twitter as a critical component of oncology community dialogue during a medical conference. Key learnings from this study can be enacted by pharmaceutical/biotechnology companies, academic associations, clinical institutions and non-profit organizations alike to improve healthcare community collaboration and deliver measurable results. The second case study will review Twitter as tool to augment trading strategies in the absence of real news. Currently, numerous natural-language processing based systems exist in support of algorithmic trading strategies. Most systems report an accuracy of sentiment assignment >90% compared with human raters. However, major limitations exist for such systems. Firstly they are contingent on the need for a critical mass of news events to actually become available in the public domain, and secondly they rely on standard sentiment lexicon to detect news polarity which may not always be sensitive for industry-specific topics. Twitter is gaining increasing popularity with the trading and equity research community, despite not being universally accepted as an authoritative source of "news." This case will explore the potential of tweet aggregation to guide equity trading decisions in the absence of real news.
5:20 pm-5:40 pm Content Analytics for Better Search
In this presentation we'll talk about key phrase extraction, collocations, and statistically improbable phrases. We'll show how these automatically extracted phrases can be used for enrichment of content such as venue reviews or book content. To show how such phrases can be used to improve a typical search experience, we'll show how we at Sematext use our own Key Phrase Extractor to enhance search results on a couple of public search sites we operate ( and, where KPE is used to provide search facet-like query refinement and drill-down functionality.
Otis Gospodnetic, Partner, Sematext International
5:40 pm Wrap-up
5:40 pm-7:00 pm Networking Reception
© 2010 Smart Content Conference · Organized and Produced by Alta Plana Corporation