Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


Kiwifruta

1423 posts

Uber Geek

ID Verified

#233750 1-May-2018 10:00
Send private message

Hi

 

I've been a commercial analyst for just over a year and spend a lot (most) of my time writing (mostly ad-hoc) SQL queries, mostly in Oracle's SQL Developer but also in Microsoft's SQL Server. Prior to this role I had no experience with SQL and only hobby experience and 3 university courses that involved coding. I enjoy coding a lot and want to raise the quality of my code especially the outputted results and also my productivity. 

 

So far I've read two solid texts on Oracle SQL and T-SQ, and "The Clean Coder" by Robert C. Martin. All books have been superb introductions to the languages and some best practices (unit testing).

 

I'm now wanting to have a more robust method for testing the quality of the data my queries output before I hand them over to management.

 

I've asked around the office, googled and looked around Amazon but haven't found much pertaining to testing SQL queries, although there is a lot relating to test driven development it mostly relates to non-SQL languages.

 

What I've come up with so far is

 

i) test what others tell me about the tables and test the opposite

 

ii) check for duplicates where there shouldn't be any

 

iii) break the query down to bite size pieces (UNIX approach do one thing and do it well) and put the output in to views or tables

 

iv) run unit tests over those views/tables

 

v) get the plan, code, output peer reviewed by a SME

 

vi) try a different approach to answer the question. Are the results similar?

 

vii) spot check the results

 

viii) write down all the checks I've done

 

ix) visualise the data results

 

 

 

What I'd like to find is a thorough way to test, that can pick up the things I didn't know that I didn't know. 

 

 

 

Thanks in advance

 

 

 

 


Create new topic
Regs
4066 posts

Uber Geek

Trusted
Snowflake

  #2006137 1-May-2018 16:47
Send private message

Interesting.

I think the way i would typically validate results for reporting style queries would be to do checks along the way. Does sales total joined tables = total sales from the raw transaction table. How about compared to reports from othe systems. Etc

I also do a lot of visualisation - both with numbers and graphs - to spot anomalies and prove them right or wrong

For transactional queries, unit tests definitely can be used.

For efficient queries, playing with indexes and viewing query execution plans is the key mechNism.

Everything you suggested sounds sensible, don't think there are silver bullets






jonherries
1395 posts

Uber Geek

Trusted
Subscriber

  #2006217 1-May-2018 19:49
Send private message

My experience sounds similar to both of yours.

I think testing is hard unless you understand what the answer should be, especially if there isn't a robust data reuse/MDM toolset and approach (ie. unit creation). By contrast in an application development environment - development tools have been developed in this way (modularity, IDEs, OOP etc) and so success can be defined a lot easier which makes testing easier.

Generally I have found if the answer feels unexpected or wrong - then it probably is (I realise this is problematic though ie. confirmation bias). The feel for this comes from regular analysis of the data or understanding the business/customer process that generates the data.


Jon

TwoSeven
1623 posts

Uber Geek

Subscriber

  #2006232 1-May-2018 20:07
Send private message

If it helps, the field I think you are talking about I would suggest is called “informatics” rather than programming as such.

If one is working with relational data, then the organisation of the data can be considered important - the technique typically used is called ‘normalization’ and the most common way of arranging data is in 3rd normal form. To me, this gives the best balance of self testability and flexibility (ability to change the model).

A “model” that is effectively normalised makes it harder to inject poor data as the rules tend to prevent this type of thing. Consider the relationship, person -> name. A person can have multiple names but must have at least one. Therefore creating a person entity would fail if the constraint check shows that there is no existing name.

Also, in informatics I would make the suggestion that the meaning of the data (in the business context) is just as important as the data itself. If the data returned does not answer the question that is being asked, then is it the right data? Often testing effort can be around creating user scenarios that identify the question is being asked. The test case is there to ensure the data retrieved fulfills the scenario.

Typical questions are how, what and when. One that is often missed is “why”. Why does that data have the value that it does?




Software Engineer
   (the practice of real science, engineering and management)
A.I.  (Automation rebranded)
Gender Neutral
   (a person who believes in equality and who does not believe in/use stereotypes. Examples such as gender, binary, nonbinary, male/female etc.)

 

 ...they/their/them...




Kiwifruta

1423 posts

Uber Geek

ID Verified

  #2006990 3-May-2018 11:00
Send private message

@TwoSeven I didn't quite follow your post. 
Are you talking about setting up tables in a database or extracting data from an existing database?

 

Could you elaborate please?


TwoSeven
1623 posts

Uber Geek

Subscriber

  #2007427 3-May-2018 19:22
Send private message

More higher order techniques.

When working with unstructured data (real world attributes and values) - it only becomes meaningful when it is given some structure and classification. When the structure also includes the behaviour, the result is called a domain model.

When the data does not contain the behaviour the result of the structure can be called a data model. Typically represented by an entity relationship diagram.

Whatever type of model you have, it can be stored in a relational form (in a relational database). The technique of translating the model into its relational form is called normalisation.

In normalisation the most ideal form (way of translating the structure in relational form) is what is called 3rd normal form. This is a set of rules on how to create the tables, data types, rules, checks and constraints used to store the data. To put a record into a table, all of the rules and constraints must be evaluated. The same for all of the other operations (create, read, update etc).

By putting the [testing] rules into the structure of the database, it becomes increasingly difficult to create invalid records or to generate unexpected behaviour. In effect your unit tests are built into your relational model.

Using models to represent things is part of model driven development/architecture. Using domain models can be useful when performing “domain driven development”.




Software Engineer
   (the practice of real science, engineering and management)
A.I.  (Automation rebranded)
Gender Neutral
   (a person who believes in equality and who does not believe in/use stereotypes. Examples such as gender, binary, nonbinary, male/female etc.)

 

 ...they/their/them...


Kiwifruta

1423 posts

Uber Geek

ID Verified

  #2027696 2-Jun-2018 22:03
Send private message

@TwoSeven thanks

Create new topic





News and reviews »

Air New Zealand Starts AI adoption with OpenAI
Posted 24-Jul-2025 16:00


eero Pro 7 Review
Posted 23-Jul-2025 12:07


BeeStation Plus Review
Posted 21-Jul-2025 14:21


eero Unveils New Wi-Fi 7 Products in New Zealand
Posted 21-Jul-2025 00:01


WiZ Introduces HDMI Sync Box and other Light Devices
Posted 20-Jul-2025 17:32


RedShield Enhances DDoS and Bot Attack Protection
Posted 20-Jul-2025 17:26


Seagate Ships 30TB Drives
Posted 17-Jul-2025 11:24


Oclean AirPump A10 Water Flosser Review
Posted 13-Jul-2025 11:05


Samsung Galaxy Z Fold7: Raising the Bar for Smartphones
Posted 10-Jul-2025 02:01


Samsung Galaxy Z Flip7 Brings New Edge-To-Edge FlexWindow
Posted 10-Jul-2025 02:01


Epson Launches New AM-C550Z WorkForce Enterprise printer
Posted 9-Jul-2025 18:22


Samsung Releases Smart Monitor M9
Posted 9-Jul-2025 17:46


Nearly Half of Older Kiwis Still Write their Passwords on Paper
Posted 9-Jul-2025 08:42


D-Link 4G+ Cat6 Wi-Fi 6 DWR-933M Mobile Hotspot Review
Posted 1-Jul-2025 11:34


Oppo A5 Series Launches With New Levels of Durability
Posted 30-Jun-2025 10:15









Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.