Blog powered by TypePad

January 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

January 29, 2008

Working Lighter

So, I have a new job and new career.  Actually, it's a new direction in my career.  I am a Data Architect at a major independent Oil and Gas company.  The challenge is huge and it's been just about a month and it's still great.  Can't wait to go to work each day!
But with this comes a change.  I am tired of dragging my tablet around downtown, so I am leaving in a locked drawer at the office.  So, that's okay but we all know that I take my work home... so how do I take my work home?  Well, thank goodness for USB sticks.  But a lot of things that I like to work with (wiki's, mind maps, rss feeds, etc) are not available via the corporate PC.  So, what to do?  Well, portable apps to the rescue.  I have decided to find any/all applications that can run from the USB stick.  So, here is my list:
- RSSBandit
- Oracle's SQL Developer
- Snippy
- Firefox
- NotePad++
- KeePass
- VTN
- DoubleKiller
- MyUnInstaller
- RegScanner
- SQLMonitor
- SQLiteAdmin
- various SysInternal applications
- TiddlyWiki

I will update the list as I use it over the next few weeks.

January 04, 2008

Starting afresh

So, it's a New Year.
And what is my resolution(s) for this year.
Well, nothing that I am putting to electronic print, that's for sure...

But, I am and always be intrigued by process, what's there, how it's used, who uses it, who doesn't, etc.  So, with that in mind, I am going try and use the PPDM v3.8 data model to truly capture process.  I have a real nice little query that details a lot of information about privileges and grants in an Oracle database that I want to load into the PPDM meta-model.  So, I will.  And I will use this blog as a way of explaining what/how I have done.

So, I have done the preliminary data mapping between the result set from the query and tables in PPDM.  And, as usual; it does not map 100%.  So, I have the option of either trying to get the PPDM to alter the model or just do it myself.  Now, that means that I don't have a 100% compliant PPDM implementation, but then, who does?

The next step is to define how to load/update the data.  Here is what I have in mind:
- select an Oracle ID that I want to put my temporary load table in
- create temporary table & sequence
- write the sqlldr script to purge & then load the temp table
- grant permissions (Select, Insert, Update, Delete) on PPDM tables that I access
- write a data loader that will load the data as well as track that I loaded the data.  I have not mapped out how to do this, but it's on the list

One last problem is where to store what I have done.  I like MindManager as it's a very graphical way to display what you have done.  I also like my wiki as it's really portable.  And then there's this blog...

November 25, 2007

Review what I did, find some problems, fix it and repeat

I imported the schema into my XE database on my tablet computer and was looking at the data via my Load of the Rings application and found that the PPDM Groups was only populated with 'CORE' tables, not the referenced tables.  So I went back to the main database and set about populating these rows.

Well, that uncovered that I did not populate the reference constraints properly, so I had to fix that first and then populate the 'REFERENCED' tables. 

That's all done and I will export the schema again, copy it to my tablet and then what else I find out that I missed.

November 20, 2007

Getting Started

Okay, so I have a new set of DDL from the PPDM to load up the RMOTC data set.  Now where to start?  Well, here are the steps that I did.

  1. Create a new Oracle schema called P38 with connect & resource.  Not very exciting or current, but it works.
  2. edit the .sql script that came with it so a .log file is created and I can see what, if any errors are create.
  3. run the .sql script file and review the .log file; no errors.  Okay, I have a empty PPDM v3.8 Alpha schema with all 1692 tables and full RI.  Yippy.
  4. Have to load the base tables, R_SOURCE & R_PPDM_ROW_QUALITY.  There are two ways of doing this; the first is to run a series of SQL Insert & Update statements in the right order or disable the RI between the two tables, run simple SQL Insert statements and then turn the RI back on.  That's the key, turning the RI back on.
  5. The next step is to populate the R_* tables of the tables that I want to use.  Fortunately I have a series of SQL Insert files kicking around and just modded these.  I did review what was stored in the row_quality and source columns.
  6. Now on to the capturing the meta-data object info.  Again, I have a series of SQL statements that do this.  I had to assign 'dba' to the account; temporarily, so that I could query the 'dba_*' views.  I got lazy about this but what the hay.  It's my database.  I did remember to revoke this privilege at the end.
  7. Did an export at this point and imported the dataset into the Oracle XE database on my tablet.

November 17, 2007

A Journey

Hey, it's been awhile.
Where have I been?

Mainly trying to wrap up stuff at the lake and then prep for the PPDM Fall conference.  I volunteered to try and load a public data set into the PPDM v3.8 data model.  The reason for this was to provide a standard data set that shows the integration of PPDM.  So, that's done and two things came out of it:
- members showed great enthusiasm about it (but that's about it)
- meta data should be able to store what we decided and how / where / why /when we did it.

So this blog is a great place to document what was done in the project and where it could be stored in the meta-data.

August 26, 2007

Third Type of Data

Okay, so I am helping out the PPDM load data into the new version; 3.8 with some Teapot Dome data.  And it's going well.  We are starting to use the PPDM to feed itself and that's always a good thing.  But what I am having problems with is working or talking about this third type of data.  Let me explain. 
We all know about reference data.  That's data that we use for lookups, like country, province, test type, etc.  That's the first type.  We know about the second type; subject area data.  This is data that is pertinent about only the subject .  Wells, Land, Seismic, etc.  But what about the 3rd type?  Data that displays characteristics of both reference and subject data?  Take Business Associate as an example.  It has information about BA's.  Quite detailed data in-fact.  So it's subject data right?  But wait, it's used as a reference or lookup in just about every other subject data set, so it's reference as well. 
So, you can see the quandary.  Which type is it?  What is it?  Other examples include field, coordinate system, UOM,  etc.  I believe the PPDM calls this data 'support' data and that's okay.  However, I would like to propose a new name for it; organizational data.  The definition of organizational data is that it supports the organization by being used in many different applications/departments because they
all have a need for a single common data set.  Maybe 'common data' might be a better term.   As I think about common vs organizational; I think that I like the impression of what 'organizational' gives vs what common says.  Maybe that's just me, but I do want to find a good term for this 3rd type and start to use/explain/preach about it.

August 17, 2007

R_CHICKEN_OR_EGG

So, which comes first?
That's always the quandary. 
Especially when you want to start to populate a PPDM database without turning off any constraints.  Do you populate the R_SOURCE table first?  But then, how can you put a value in the Row_Quality column?  Okay, so then let's populate the R_PPDM_ROW_QUALITY table first.  And you are going to put what in the Source column?
So, which comes first?

Well, it really does not matter.  If you take a bit of time, you can populate these two tables with a couple of SQL statements, without disabling constraints.  How do you do it?  Well, lets find out.  Say I wanted to store PPDM and dMCI as sources with REAL, DERIVED & NOT CLASSIFIED as row quality values.  One way is:

  • store PPDM as a source, without a row quality or source value
  • store REAL & NOT CLASSIFIED as row quality with the source of PPDM
  • update the row quality value in the row in the source table to REAL
  • add dMCI as a source, with a row quality of NOT CLASSIFIED and source of PPDM
  • add DERIVED as row quality with the source of dMCI

There you have it.  From this, you should be able to build on populating your database.  That's all there is to it.  You could also start with populating the row quality table first; just make sure that you don't put anything into the source column in the initial insert.  You have to update it later.
There are several other tables in PPDM that have this type of relationships.  You just have to slow down, think about it and realize that it will take a couple of statements to make it happen. 

Now, back to that chicken....

August 05, 2007

Too much fun

Well, I finally got the Oracle package working for the PPDM Association.  This should greatly help speed up the release of v3.8.  I can't wait for it to come out as the meta-data section promises to be nice to work with. 
I am on holidays up at Sylvan Lake and of course, expected to get a bunch of stuff done.  Computer/data stuff.  Well, about all I got done is some water-skiing, wake-boarding and the new sport: wake surfing.  Not that I am complaining, either. 
Did fine one interesting product;  FileList.  I needed to list all of the files in a directory (and the subsequent sub-directories), when they were created and if I could, get a column of all of the unique file-types.  Now, you can do this with a dos command like:  dir /a /b >output.csv etc.  And that's what I did and spent two plus hours editing the output into a usable format.  After I was done, I then looked for this program and be darned if I did not find it; on my own c:\ drive.  So, I redid my work in a couple of minutes and liked the result even more than what I did by hand.  So, the lesson here?  Be lazy and expect that someone else has done what you need done before and go & find it.
Now to got on to serious business... logical data architecture.

July 27, 2007

Less is More

I went to a 2 day seminar put on by the Alberta Data Architects called Data Modeling MasterClass.  It was led by Graeme Simsion.  I thought it was a great two days.  Graeme lead the class through a review of his book:  Data Modeling Essentials: 3rd Edition.  While you cannot cover a book in two days, we got a thorough review at 50,000 ft.  I thought the personal anecdotes were superb  Graeme has been around the world (literally and figuratively) in his work on data models and there are probably hardly any (okay, none) business requirements that he has not seen.  Themes from his presentation are:

  • gather business requirements, get clarification and gather business requirements
  • build a model several times; each differently.  Each is not supposed to be right; it is just supposed to give you perspective.
  • don't start with a blank page.  Build on the strength/wisdom of others
  • Communication is paramount;  between business staff, data modelers, dba's, etc.  Just talk people!
  • Keep up with technology;  you never know when 'cool things' are going to show up in a production environment.
  • Did I mention communication?

There are more things I want to review/talk about but I gotta hit the web with some searches that were triggered from yesterday.

July 23, 2007

FUD Man

FUD...

Fear
Uncertainty
Doubt

Man, don't you hate a sales man who does not actually sell their product; but instead raises FUD on the competitor?  Usually, they are good at this, really good so that you don't even notice it.  In fact, you think that the concern that you have is your own.  It's only when you have time to sit back and think about it, that you realize that you have spent more thought/energy on something that you did not have a problem AND not on what you did.
LIke I said, it just rips me.  And I got caught up with a FUDMan scenario last week.  Took awhile to realize that I was chasing the FUD.  The problem was not in our process, or the business unit but in the supply.