SAP HANA Platform

Document Version: 3.0 – 2016-05-11

What's New in SAP HANA Advanced Data Processing

(Release Notes)


1 What's New in SAP HANA Advanced Data Processing................................ 3

2 SAP HANA Advanced Data Processing SPS 12, Features Included in Revision 120........... 4

2.1 Enabling Search (New)......................................................... 4

2.2 Text Analysis (New and Changed)................................................. 6

2.3 Text Mining (New and Changed).................................................. 7 3 SAP HANA Advanced Data Processing SPS 11, Features Included in Revision 110........... 8

3.1 Enabling Search (New)......................................................... 8

3.2 Text Analysis (New and Changed)................................................. 8

3.3 Text Mining (New and Changed)..................................................10 4 SAP HANA Advanced Data Processing SPS 10, Features Included in Revision 100.......... 12

4.1 Text Mining (New and Changed)..................................................12 5 SAP HANA Advanced Data Processing SPS 09, Features Included in Revision 90.......... 14

5.1 Overall Documentation Changes................................................. 14

5.2 Enabling Search (New and Changed).............................................. 14

5.3 Text Analysis (New and Changed)................................................ 16

5.4 Text Mining (New)............................................................18 What's New in SAP HANA Advanced Data Processing (Release Notes)


Content 2 © 2016 SAP SE or an SAP affiliate company. All rights reserved.

1 What's New in SAP HANA Advanced Data Processing Use this document to find out about new and enhanced features of the SAP HANA advanced data processing option for the SAP HANA platform as of support package stack 09.

Table 1:

Support Package Stack (SPS)

–  –  –

This document accumulates the SAP HANA advanced data processing features for all SAP HANA support package stacks (SPS) and corresponding revisions. For support package stacks lower than SPS 09, see What's New in the SAP HANA Platform (Release Notes).

To find out about issues that have been fixed in a specific revision, see the SAP Note of that revision on SAP Service Marketplace.

Related Information What’s New in the SAP HANA Platform (Release Notes) SAP HANA Advanced Data Processing on SAP Help Portal

–  –  –

Creating Full-Text Indexes Using CDS Annotations Instead of using the former XS Classic related-annotations @SearchIndex and @SearchIndexes to define search indexes you specify the indexes now in the technical configuration part of *.hdbcds files.

Available CDS Annotations and Usage

Find the following new CDS annotations:

● @EnterpriseSearchHana The annotation extends the annotations defined by @EnterpriseSearch and adds annotations that are available for SAP HANA CDS only and that are used by the built-in procedure sys.esh_search().

● @Hierarchy The annotation contains a subset of the hierarchy annotations defined as CDS core annotations. It contains only the annotations that are used by the built-in procedure sys.esh_search().

The @EnterpriseSearch annotation has a new annotation on view level:


The new file CDSTypes.hdbcds contains a type definition for CDS type ElementRef. It will be used until SAP HANA CDS supports ElementRef as a built-in type.

Note With this release all new and existing annotations have to be defined in *.hdbcds files. There has to be one

file for each annotation, and the name of the file has to equal the name of the annotation (example:

Search.hdbcds). All annotation definitions have to be added to the XS Advanced project because they have to be defined in the same container that is used to define the search views.

–  –  –

Find the following new features along the sys.esh_config() annotations:

● Multi-Value properties ● Field/attribut groups

For the definition of field groups or attribut groups you can use the following new annotations:

@EnterpriseSearch.fieldGroupForSearchQuery, @EnterpriseSearch.fieldGroupForSearchQuery.Name, @EnterpriseSearch.fieldGroupForSearchQuery.Elements.

● Subobjects

For the definition of subobjects you can use the following new annotations:

@EnterpriseSearchHana.layoutStructuredObject.defaultExpand, @EnterpriseSearchHana.layoutStructuredObject.key.

● Leveled hierarchies For the definition of leveled hierarchies you can use the new entity type annotation @Aggregation, including @Aggregation.LeveledHierarchy, @Aggregation.LeveledHierarchy.Qualifier, and @Aggregation.LeveledHierarchy.Value.

Response of a Federated Search

Find the following new features for the federated search response:

● Multi-Value columns ● Subobjects ● Facets ● Leveled hierarchy facets Search Query Language

The search query language was extended. You can use the following new features in search queries:

● Fuzzy search ● Use of OR and NOT operators ● Boosting ● Grouping ● Search in fields ● Search in IN-lists The precedence list was extended.

–  –  –

Parts of Speech for Greek Support for determining parts of speech is now expanded to cover Greek as well as all other text analysis languages.

New Korean Entity Types In the EXTRACTION configurations, text analysis now identifies CURRENCY, TIME, and PERCENT entities in Korean text.

HDI Support for Managing TA Configuration Artifacts Custom text analysis configuration files can be added to an XS Advanced Application Project just like any other database artifacts. The text analysis configuration objects will become available for use when the application is deployed. Database artifacts in XS Advanced are deployed by the SAP HANA Deployment Infrastructure.

SQL Extensions for Managing TA Configuration Artifacts

Custom text analysis configuration artifacts can be created and managed with stored procedures TEXT_CONFIGURATION_CREATE, TEXT_CONFIGURATION_DROP and TEXT_CONFIGURATION_CLEAR. The resulting artifacts are stored in the database, can be used during full-text index creation, and can be viewed with the TEXT_CONFIGURATIONS view.

Option for Combined Linguistic and Extraction Output Normally, when extraction analysis is configured, linguistic analysis output (tokens) will be omitted from the $TA table in favor of extraction analysis output. A new configuration option, OutputLinguisticTokens,

–  –  –

The Public Text Mining Data Table Interface For advanced users who would like to implement text mining algorithms of their own, the text mining termdocument matrix is exposed as a system table.

Expanded Term Type Possibilities Text mining terms can now include other parts of speech as well as text analysis entities and phrases.

–  –  –

This version of SAP HANA provides the new built-in procedures sys.esh_config() and sys.esh_search() to configure search models and execute full-text search queries.

During configuration, search annotations are created on existing views which define how full-text search queries are executed, e.g. which columns of the view are relevant for search. The search procedure takes an OData-like input and returns a JSON formatted result list, including additional information like the number of results and data needed to populate so-called facets.

The built-in procedures sys.esh_config() and sys.esh_search() expose sophisticated modeling and search capabilities directly in the database – this is especially important for applications which do not rely on the extended application services layer.

Related Information Support of Language Codes The LANGUAGE COLUMN parameter of CREATE FULL_TEXT_INDEX statement will can now take columns with SAP language codes (in addition to ISO codes).

–  –  –

Text analysis dictionaries can now be made case-sensitive. Until now they were not, meaning that for example terms USA and Usa were recognized as the same entity. Now the dictionary can be made case-sensitive. This is controlled by an XML top-level property in the dictionary definition: dictionary xmlns="http:// www.sap.com/ta/4.0" case-sensitive = "true". The default value is false (assumed when the case-sensitive property is not present), thus making the feature backward-compatible.

Parts of Speech

Support for part of speech finding is now expanded to cover all text analysis languages with exception of Greek, which is now the only language for which text analysis is not able to determine parts of speech. It also means that disambiguation of part of speech and stem is available for all languages but Greek.

Parent Property The SAP HANA XS JavaScript API for text analysis now returns an additional parent property that points to parent entity of a given entity. The parent' property of a child entity contains the id value of the parent entity.

Note This is the same information that is currently returned in the TA_PARENT column in the $TA table.

Grammatical Role Analysis With this release a new functionality was added to text analysis. It is called Grammatical Role Analysis. The primary goal of grammatical-role analysis is to identify functional (grammatical) relationships between elements in an input sentence. At the moment, the only supported language is English, and the only supported grammatical roles are those pertaining to relations between verbs and their arguments, e.g., Subject, DirectObject, etc, with the only exception of the grammatical role called PredicateSubject (see the grammatical roles documentation for complete definitions and examples). The grammatical role analysis feature comes with its set of new annotation types.

Refer to the SAP HANA Text Analysis Language Reference Guide for detailed description.

New Text Analysis Configuration GRAMATICAL_ROLE_ANALYSIS With this release a new standard text analysis configuration named GRAMATICAL_ROLE_ANALYSIS is availabe.

This configuration is used for grammatical-role analysis.

–  –  –

This release offers tolerant spelling support for English, Dutch, German and Italian. Tolerant means that linguistic analysis of those languages can be aware of some (but certainly not all) variations in capitalization, accents or hyphenation. Tolerant support is enabled by setting the VariantString property to expanded in your text analysis configuration settings.

Note This setting is already enabled in the standard text analysis configurations that ship with SAP HANA.

Metadata in $TA table The table $TA now can contain metadata extracted from input documents. The TA_RULE column will contain the marker Metadata and TA_TYPE column will specify the type of the metadata (for example FromName in metadata extracted from emails).

Overview Text mining derives information about a collection of documents by examining the terms used within the documents. It supports functions to identify related and relevant documents and terms within the collection. It also supports methods for categorizing (classifying) new documents, relative to a set of (pre-categorized) reference documents.

Supported Languages Text Mining now works for all text analysis supported languages.

–  –  –

Overview Text mining derives information about a collection of documents by examining the terms used within the documents. It supports functions to identify related and relevant documents and terms within the collection. It also supports methods for categorizing (classifying) new documents, relative to a set of (pre-categorized) reference documents.

SQL Functions

Text mining is now exposed via the following SQL functions:

● TM_GET_RELATED_TERMS returns top-ranked related terms related to a term ● TM_GET_RELEVANT_TERMS returns top-ranked relevant terms (keyphrases) that describe a document ● TM_GET_SUGGESTED_TERMS returns top-ranked terms matching an initial substring ● TM_GET_RELATED_DOCUMENTS returns top-ranked documents related to a document ● TM_GET_RELEVANT_DOCUMENTS returns top-ranked documents relevant to a term ● TM_CATEGORIZE_KNN enables the categorization functionality

–  –  –

5.1 Overall Documentation Changes

We want to inform you on the following changes on documentation:

● The Enabling Search chapter was taken out of the SAP HANA Developer Guide and is shipped as a standalone SAP HANA Search Developer Guide.

The structure of the SAP HANA Search Developer Guide was changed to a more task-oriented approach.

The guide starts with the definition of persistency and indexing, followed by the creation of search models.

Continuing with accessing the data using full-text search, it closes with the creation of search UIs.

● The Text Analysis section from the former Enabling Search chapter was taken out of the structure and is shipped as a stand-alone SAP HANA Text Analysis Developer Guide.

● Formerly both the SAP HANA Text Analysis Extraction Customization Guide and SAP HANA Text Analysis Language Reference Guide were shipped as PDF only. Effective with SPS 09, they both ship as HTML only.

