SAGE Computing Services Customised Oracle Training Workshops and Consulting Oracle Text in Apex Advanced Indexing Techniques Integrated with Application Express Scott Wesley Systems Consultant & Trainer
Agenda • Introduction • Architecture • Fundamentals • Considerations • Setting Up • Samples • Index Maintenance • Visualisation • New Features
Larry Lessig?
the law is strangling creativity http://www.ted.com/talks/larry_lessig_says_the_law_is_strangling_creativity.html http://presentationzen.blogs.com/presentationzen/2005/10/the_lessig_meth.html
Identity 2.0 – Dick Hardt http://identity20.com/media/OSCON2005/
who’s the Dick on your site
Connor McDonald http://www.oracledba.co.uk
so today’s going to be more like this
and this
after I show a few pictures
who_am_i;
http://strategy2c.wordpress.com/2009/01/10/strategy-for-goldfish-funny-illustration-by-frits/
balance
Why use Oracle Application Express?
Why use Oracle Text?
What is Oracle Text?
Document Collection
Catalogue Information
Document Classification
Architecture
Class Description Datastore How are your documents stored? Filter How can the documents be converted to plain text? Lexer What language is being indexed? Wordlist How should stem and fuzzy queries be expanded? Storage How should the index data be stored? Stop List What words or themes are not to be indexed? Section Group How are documents sections defined?
1) Example
CREATE INDEX ctx_name ON my_names(name) INDEXTYPE IS ctxsys.context PARAMETERS ('DATASTORE CTXSYS.DEFAULT_DATASTORE');
SQL> SELECT SCORE(1), name 2 FROM my_names 3 WHERE CONTAINS(name, 'fuzzy(john,,,weight)', 1) > 0 4 ORDER BY SCORE(1) DESC; SCORE(1) NAME ---------- ---------------------------------------- 100 John 100 John 70 Jon 70 Jon 63 Joan 63 Joan 52 Jong 48 Jona 8 rows selected.
2) Datastore
CTXSYS.DEFAULT_DATASTORE
BLOB
BFiles
Pointers to objects on file system
URLs
Pointers to objects on the intertube
User Defined
Why would you?
3) Index Type
a) CONTEXT
Document Collection
large document size
provides a score
asynchronous index & table data
CONTAINS
b) CTXCAT
Catalogue Information
smaller documents
text fragments
multiple attributes
set lists
similar to typical index paradigm
transactional
CATSEARCH
c) CTXRULE
Document Classification
routing information
displace manual interaction
not binary files
MATCHES
4) Considerations
location of text
document format
bypassing rows - images
character set
language
fuzzy matching & stemming
wildcard query performance
stopwords & stopthemes
query performance and storage of LOBs
mixed queries
5) Setting up
GRANT ctxapp TO ausoug;
create & delete indexing preferences
use Oracle Text PL/SQL supplied packages
1* select grantee, owner, table_name, privilege from dba_tab_privs where table_name = 'CTX_DDL' SQL> / GRANTEE OWNER TABLE_NAME PRIVILEGE -------------------- ------------ ------------------------------ -------------------- CTXAPP CTXSYS CTX_DDL EXECUTE APEX_040000 CTXSYS CTX_DDL EXECUTE APEX_030200 CTXSYS CTX_DDL EXECUTE AUSOUG CTXSYS CTX_DDL EXECUTE XDB CTXSYS CTX_DDL EXECUTE 5 rows selected.
PLS-00201: identifier "string" must be declared
CTX PL/SQL Packages GRANT EXECUTE ON CTXSYS.CTX_CLS TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_DDL TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_DOC TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_QUERY TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_REPORT TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_THES TO ausoug; GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO ausoug;
Using URL Datastore in 11g CREATE ROLE apex_url_datastore_role; GRANT apex_url_datastore_role TO APEX_040000 WITH ADMIN OPTION; GRANT apex_url_datastore_role TO ausoug; EXEC ctxsys.ctx_adm.set_parameter ('file_access_role' ,'APEX_URL_DATASTORE_ROLE');
Demonstrations Script Description Ctx_blobs.sql Import & index a range of documents Ctx_bfiles.sql Import & index BFILE pointers Ctx_urls.sql Index & search URL references Ctx_dict.sql Index & search English dictionary words Ctx_views.sql Index view SQL text for impact analysis Ctx_apex_files.sql Duplicate and search Apex file repository Ctx_apex_backups.sql Hunt through your (automated) Apex app backups Ctx_names.sql Basic name filter options Ctx_products.sql Multiple column searches Ctx_category.sql Attribute based searching Ctx_classify.sql Classify documents into categories
6) Index maintenance
indexing errors
resume failed index
ALTER INDEX ctx_surname REBUILD PARAMETERS ('resume memory 10m');
recreate index online (11g)
EXEC ctx_ddl.recreate_index_online ('ctx_surname', 'replace lexer sw_lexer');
rebuilding an index
ALTER INDEX ctx_surname REBUILD PARAMETERS('replace lexer sw_lexer') ONLINE;
ctx_report.index_stats
create table ausoug.my_stats (stats clob); declare x clob := null; begin for r_rec in (select * from ctxsys.ctx_indexes where idx_owner = 'AUSOUG' and idx_type = 'CONTEXT') loop ctx_report.index_stats(r_rec.idx_name,x); insert into ausoug.my_stats values (x); end loop; commit; dbms_lob.freetemporary(x); end; /
7) Data Dictionary
SQL> select count(*) 2 from all_views 3 where owner = 'CTXSYS'; COUNT(*) ---------- 58
8) Common Questions
DML operations on a CONTEXT index
ctxsys.ctx_user_pending
synchronise the index synchronize
EXEC ctx_ddl.sync_index('ctx_surname');
dbms_job
dbms_scheduler
how often?
optimise the index
can get fragmented
inverted index
each entry contains list of documents
DOG - DOC1 DOC3 DOC5 DOG - DOC7 DOG - DOC9 DOG - DOC11
ctx_ddl.optimize_index
capacity planning?
Object of Interest Num Rows Table Size Index size Dictionary 150k 7 27 Documents 28 34 1.5 Names 27k 1 6 Views 2k 7 2 BFiles 4 Product 1 URL 1
more text
cleaner data
less overhead
document format
next steps?
read Application Developer’s Guide
find examples
experiment
SAGE Computing Services Customised Oracle Training Workshops and Consulting Question time Presentations are available from our website: http://www.sagecomputing.com.au enquiries@sagecomputing.com.au scott.wesley@sagecomputing.com.au http://triangle-circle-square.blogspot.com

Oracle Text in APEX