I am a true believer in automation, so the last thing I want to do is create a process or work flow that I or someone else has to baby sit. When I began working on a project to see what type of data I could get from Jive, an online community application, and into Ominture I wanted to make sure that the solution could be automated.
Before we get into it, there is a key point I want to make about the data. The data base that you get from Jive is not your typical clickstream data set. It is an activity database. You will not get pagenames, or visitor session information. What you will get is an activity logged for everything a users does on the community site. Views/creates/modifies a thread, blog post, question, poll, etc.. If the user is logged in, you get their user id, name, etc.. If the user is not logged in, they all show up as a single user. Even with this limitation, there is still plenty of useful data to be found.
Configuration requirements/prerequisites: While the end solution is fully automated, there is a fair amount of configuration that most be completed.
ColdFusion Server: The main controlling files are cfm pages that are called via the CFAdmin / Scheduled tasks. This process could be accomplished using your scripting/coding language of choice.
PostgreSQL: The .dmp file that Jive provides is in the PostgreSQL format. This is a free/open source database solution. I found that the built in admin works ok for setting up the database and user to be used for the scripts, and for creating any views to assist with the ETL process, but is severely lacking in terms of exporting capabilities.
- Create 2 views to pre-process the data for extract
- Known Users
- Unknown users / guests
- Define headers of views to align to Omniture – Data Source file requirements.
SQL WorkBench : A free / open source db admin tool with exceptional data export features which are all accessible via the command line interface. Once you install the tool, you will need to create a connection profile to be used in all scripts.
- Create SQL queries to perform data extract
- Need 1 .sql file for each query
- creates .txt file with data ready to ftp
- creates matching .fin file to ftp
Command line / batch file access: While most of the controlling files are ColdFusion pages, the files that perform the restore and data extraction are batch files. It is up to you to use the format of calling these batch files that you are most comfortable and confident with.
Omniture Report Suite: Before uploading any data, make sure you have planned out what data points you want to capture. Walk through the tables in the .dmp file to see what elements are most important to your business needs. Once you have this, work with Client Care to set up the report suite. Below is a list of some of the items you will want to consider.
- Define props / evars / events
- Enable custom visitorID (uses userID from Jive)
- Enable timestamps
- Generic Data Source setup with full processing
- FTP information
Local Folders: Working directory to hold batch files, cfml pages, data extracts, and a sub folder to move completed files to.
Once all of the configuration steps have been completed, you can begin tackle the automation.
Work Flow – Steps
- Download daily dmp file from Jive of the Analytic / Activity.
- Restore dmp to local postgreSQL db
- Extract data
- FTP data extracts to Omniture
Only 4 steps! That is not so bad. So lets look at what is involved in each step.
Step 1: Jive provides an “Analytics” database in either a .sql.gz or a.dmp file. I say “Analytics” because the data does not contain all of the traditional analytics information, page views, etc… What it does hold is activity. Every time a user views a thread, blog post, poll etc… that is logged. As is when users create content for the site. All of this information is held within the myriad of tables in the .dmp file. The database is a PostgreSQL format
The challenge is how to automate the download of this file. I used a CFHTTP call as it allows me to provide a userid/password to authenticate into the Jive Cloud Admin site to retrieve the file. It is also an easy way to customize the filename to include the date. Each .dmp file is named in the following format: accountname-yyyymmdd.dmp
<cfhttp username="yourLogin" password="yourPassword" method="GET" url="#variables.URLLocation#" result="dmpGet" resolveURL="yes" getAsBinary="yes" />
Once you have written the script to pull the database you have to schedule the download around the processing that Jive performs. Each day, they delete the previous day’s file and begin the process to create the next day’s file. This can take anywhere from 2-6 hours and during that time the new .dmp file is not there. Each file is a full dump, so don’t worry if you miss a day, you will always be able to go back and retrieve past information.
If the call to download tries to get the .dmp file when the file is not yet ready I create a db_download.fail file in the working directory. This is mostly used for debugging purposes. Once the file is successfully downloaded I create a db_download.done and a db_download-yyyymmdd.done file. The db_download.done file serves as a flag to the next step that the dmp file is there an ready to be restored. The db_download-yyymmdd.done lets the download process know to not attempt to download the dmp file again for that day.
Step 2: Restore .dmp to local postgreSQL
The controlling cfml page checks to see if the db_restore.done exists. If it does, it executes a batch file that uses a command line command to restore the database and once the restore is finished it creates a db_restore.done file. It also moves the db_download.done to the processed folder within the working directory and renames it to db_download-yyyymmdd.done.completed
"C:\Program Files\PostgreSQL\8.4\bin\pg_restore.exe" --clean --host %server% --port 5432 --username %username% --dbname %database% --verbose D:\WorkingDirectory\db.dmp
Step 3: Extract Data
The controlling file checks to see if the db_restore.done file exists. Once it finds the db_restore.done file, it executes each predefined .sql file. These files create a daily-dataset.txt file that is properly formatted to be ftp’d to omniture. It also creates a .fin file of the same name to be ftp’d as well. Once all of the .sql files have been processed it moves the db_restore.done file to the processed folder within the working directory and renames it to db_restore-yyyymmdd.done.completed. The controlling file also creates a db_extract.done file.
WbExport -type=text -file='d:/workingdirectory/dailyfile.txt' -header=True; // pseudo code SELECT Your data WHERE to_char(data.date,'mm/dd/yyyy') = to_char(current_date-1,'mm/dd/yyyy'); WbExport -type=text -file='d:/workingdirectory/dailyfile.fin' -header=false; select * from jivedw_user where user_id = -99;
Step 4: FTP completed files to Omniture
The controlling file checks to see if the db_extract.done file exists. Once it find this, the I use CFFTP to open a secure connection to the Omniture FTP server defined in the configuration. Each file is ftp’d to Omniture along with the matching fin file. Again these files are also moved to the processed folder in the local working directory and renamed with the time-stamp.
Next steps? Or as Avinash likes to say… So What?
Now that you have a daily process that will pull, transform, and load the activity data from your Jive installation into Omniture, what else can you do/discover.
- Find out which areas of the community are generating the most content
- Create automated report publications to community administrators / moderators
- Easily identify the most active:
These are just a few things you can do with the activity data once you have it inside of Omniture. What are some of the things you will use the data for?