Skip to main content

JDBC formula reference

Overview

This document provides a comprehensive reference for JDBC formulas supported in DataGOL. These formulas enable powerful data processing, analysis, and transformation capabilities directly within your DataGOL workflows.

What you can achieve?

  • Data aggregation: Calculate sums, averages, counts, and statistical measures

  • Text processing: Clean, format, and manipulate string data

  • Date/Time operations: Parse, format, and calculate with dates and timestamps

  • Conditional logic: Implement business rules and data validation

  • Mathematical operations: Perform calculations and statistical analysis

Mathematical functions

ABS - Absolute Value


Purpose:

Removes negative signs from numbers.

Usage:

ABS(number)  

Example:

returns 25

ABS(-25)

Use Case:

Calculating distances, finding differences without caring about direction, cleaning up negative values in reports.

Mathematical functions

CEIL - Round Up


Purpose:

Always rounds numbers up to the next whole number.

Usage:

CEIL(number)

Example:

Returns 5

CEIL(4.1)

Use Case:

Used for calculating minimum packages needed, determining required resources, and inventory planning.

FLOOR - Round Down


Purpose:

Always rounds numbers down to the previous whole number.

Usage:

FLOOR(number)

Example:

Returns 4

FLOOR(4.9)

Use Case:

Useful for calculating completed units, determining available capacity, and budget calculations.

ROUND - Smart Rounding


Purpose:

Rounds numbers to the nearest whole number or specified decimal places.

Usage:

ROUND(number, decimal_places)

Example:

Returns 123.46

ROUND(123.456, 2)

Use Case:

Applied in financial calculations, displaying prices, and creating clean reports.

POWER - Exponents


Purpose:

Multiplies a number by itself a specified number of times.

Usage:

POWER(base, exponent)

Example:

Returns 8 (2×2×2)

POWER(2, 3)

Use Case:

Used for compound interest calculations, growth projections, and area/volume calculations.

SQRT - Square Root


Purpose:

Finds what number multiplied by itself equals the given number.

Usage:

SQRT(number)

Example:

Returns 4

SQRT(16)

Use Case:

Applied in statistical calculations, geometric measurements, and risk analysis.

LN - Natural Logarithm


Purpose:

Mathematical function for advanced calculations and modeling.

Usage:

LN(number)

Example:

Returns 2.302585

LN(10)

Use Case:

Used in scientific calculations, decay models, and financial modeling.

PI - Pi Constant


Purpose:

Returns the mathematical constant $\pi$ (3.14159...).

Usage:

PI()

Example:

Returns 3.14159

PI()

Use Case:

Used in circle calculations, geometric formulas, and engineering calculations.

MOD - Remainder


Purpose:

Shows what's left over after division.

Usage:

MOD(dividend, divisor)

Example:

Returns 1 (10÷3 = 3 remainder 1)

MOD(10, 3)

Use Case:

Useful for creating alternating patterns, pagination, and scheduling rotations.


Statistical functions

AVG - Average


Purpose:

Calculates the mean value of a group of numbers.

Usage:

AVG(column_name)

Example:

Might return 50000

AVG(salary)

Use Case:

Used for performance metrics, salary analysis, quality scores, and customer ratings.

SUM - Total


Purpose:

Adds up all numbers in a group.

Usage:

SUM(column_name)

Example:

Might return 125000

SUM(sales)

Use Case:

Used for revenue totals, expense calculations, inventory counts, and performance totals.

COUNT - Count Items


Purpose:

Counts how many items exist (ignores empty values).

Usage:

COUNT(column_name)

Example:

Might return 150

COUNT(customer_id)

Use Case:

Used for customer counts, order quantities, active users, and completion rates.

MIN - Smallest Value


Purpose:

Finds the lowest number in a group.

Usage:

MIN(column_name)

Example:

Might return 9.99

MIN(price)

Use Case:

Used for lowest prices, minimum scores, earliest dates, and budget floors.

MAX - Largest Value


Purpose:

Finds the highest number in a group.

Usage:

MAX(column_name)

Example:

Might return 98

MAX(score)

Use Case:

Used for peak performance, highest prices, latest dates, and record highs.

MEDIAN - Middle Value


Purpose:

Finds the middle number when all values are arranged in order.

Usage:

MEDIAN(column_name)

Example:

Might return 45000

MEDIAN(income)

Use Case:

Used for salary benchmarks, performance baselines, and market analysis.

STDDEV - Standard Deviation


Purpose:

Measures how spread out numbers are from the average.

Usage:

STDDEV(column_name)

Example:

Might return 2.5

STDDEV(response_time)

Use Case:

Used for quality control, performance consistency, and risk assessment.

VARIANCE - Variance


Purpose:

Measures how much numbers vary from the average (squared differences).

Usage:

VARIANCE(column_name)

Example:

Might return 1250000

VARIANCE(sales)

Use Case:

Used for financial analysis, quality metrics, and forecasting accuracy.

Text functions

CONCAT - Join Text


Purpose:

Combines multiple pieces of text into one.

Usage:

CONCAT(text1, text2, text3...)

Example:

Returns "Hello World"

CONCAT('Hello', ' ', 'World')

Use Case:

Used for creating full names, building addresses, generating reports, and email formatting.

UPPER - Make Uppercase

Purpose:

Converts all letters to capital letters.

Usage:

UPPER(text)

Example:

Returns "HELLO"

UPPER('hello')

Use Case:

Used for standardizing data entry, creating consistent formats, and report headers.

LOWER - Make Lowercase


Purpose:

Converts all letters to small letters.

Usage:

LOWER(text)

Example:

Returns "hello"

LOWER('HELLO')

Use Case:

Used for email addresses, usernames, data cleaning, and search functions.

INITCAP - Capitalize Words

Purpose:

Makes the first letter of each word uppercase.

Usage:

INITCAP(text)

Example:

Returns "John Smith"

INITCAP('john smith')

Use Case:

Used for proper names, titles, address formatting, and professional correspondence.

TRIM - Remove Extra Spaces

Purpose:

Removes spaces from the beginning and end of text.

Usage:

TRIM(text)

Example:

Returns "Hello World"

TRIM(' Hello World ')

Use Case:

Used for data cleaning, form processing, and import/export cleanup.

LTRIM - Remove Left Spaces


Purpose:

Removes spaces only from the beginning of the text.

Usage:

LTRIM(text)

Example:

Returns "Hello"

LTRIM(' Hello')

Use Case:

Used for data formatting, code cleanup, and text alignment.

SUBSTRING - Extract Text Portion

Purpose:

Takes out a specific part of the text.

Usage:

SUBSTRING(text, start_position, length)

Example:

Returns "atab"

SUBSTRING('Database', 2, 4)

Use Case:

Used for extracting codes, parsing data, creating abbreviations, and ID processing.

RIGHT - Get End Characters


Purpose:

Takes a specified number of characters from the end of the text.

Usage:

RIGHT(text, number_of_characters)

Example:

Returns "base"

RIGHT('Database', 4)

Use Case:

Used for file extensions, postal codes, ID suffixes, and validation codes.

REGEXP_REPLACE - Pattern Replacement


Purpose:

Finds and replaces text patterns using advanced matching rules.

Usage:

REGEXP_REPLACE(text, pattern, replacement, flags)

Example:

Returns 1234567890

REGEXP_REPLACE('123-456-7890', '-', '', 'g')

Use Case:

Used for phone number formatting, data cleaning, removing special characters, and standardizing formats.


Date functions

CURRENT_DATE - Today's Date


Purpose:

Returns today's date.

Usage:

CURRENT_DATE

Example:

Returns 2025-01-22

CURRENT_DATE

Use Case:

Used for timestamping records, calculating ages, due date calculations, and report dating.

CURRENT_TIMESTAMP - Current Date and Time

Purpose:

Returns the exact current date and time.

Usage:

CURRENT_TIMESTAMP

Example:

Returns 2025-01-22 10:15:30

CURRENT_TIMESTAMP

Use Case:

Used for transaction logging, audit trails, real-time reporting, and activity tracking.

AGE - Calculate Time Difference


Purpose:

Calculates the number of days between two dates.

Usage:

AGE(end_date, start_date)

Example:

Returns 7

AGE('2025-01-22', '2025-01-15')

Use Case:

Used for customer tenure, project duration, payment delays, and service periods.

DATE_PART - Extract Date Component


Purpose:

Pulls out specific parts of a date (year, month, day, etc.).

Usage:

DATE_PART('part', date)

Example:

Returns 2025

DATE_PART('year', '2025-01-22')

Use Case:

Used for monthly reports, yearly summaries, seasonal analysis, and age calculations.

DATE_TRUNC - Round Date


Purpose:

Rounds dates down to a specific time period.

Usage:

DATE_TRUNC('precision', date)

Example:

Returns 2025-01-01

DATE_TRUNC('month', '2025-01-22')

Use Case:

Used for monthly grouping, quarterly reports, trend analysis, and data aggregation.

Utility functions

COALESCE - Handle Missing Data


Purpose:

Returns the first non-empty value from a list.

Usage:

COALESCE(value1, value2, value3...)

Example:

Returns "Fallback"

COALESCE(NULL, 'Fallback', 'Default')

Use Case:

Used for default values, handling missing data, report formatting, and data validation.

CASE WHEN - Conditional Logic

Purpose:

Creates an if-then-else logic for different scenarios.

Usage:

CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ELSE default_result END

Example:

Assigns 'A' if score >= 90, 'B' if score >= 80, else 'C'

CASE WHEN score >= 90 THEN 'A' WHEN score >= 80 THEN 'B' ELSE 'C' END

Use Case:

Used for grade calculations, status assignments, categorization, conditional formatting, and business rules.

Best practices

Naming conventions


Column and table names

--  ✅  GOOD:  Clear,  descriptive  names
SELECT customer_first_name, order_total_amount
FROM customer_orders

-- ❌ AVOID: Cryptic abbreviations
SELECT cust_fn, ord_tot_amt
FROM cust_ord

Formula aliases

--  ✅  GOOD:  Meaningful  aliases
SELECT
CONCAT(first_name, ' ', last_name) AS full_name,
AVG(salary) AS average_salary,
COUNT(*) AS total_employees

-- ❌ AVOID: Generic or no aliases
SELECT
CONCAT(first_name, ' ', last_name),
AVG(salary) AS avg1,
COUNT(*) AS cnt

Code formatting


Readable structure

-- ✅ GOOD: Well-formatted and indented
SELECT
department_name,
COUNT(employee_id) AS employee_count,
AVG(salary) AS average_salary,
MAX(hire_date) AS latest_hire_date
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE hire_date >= '2020-01-01'
GROUP BY department_name
ORDER BY average_salary DESC;

-- ❌ AVOID: Everything on one line
SELECT department_name,COUNT(employee_id),AVG(salary),MAX(hire_date) FROM employees e JOIN departments d ON e.department_id = d.department_id WHERE hire_date >= '2020-01-01' GROUP BY department_name ORDER BY AVG(salary) DESC;

Function spacing

-- ✅ GOOD: Consistent spacing
CASE
WHEN score >= 90 THEN 'Excellent'
WHEN score >= 80 THEN 'Good'
WHEN score >= 70 THEN 'Average'
ELSE 'Needs Improvement'
END AS performance_rating

-- ❌ AVOID: Inconsistent spacing
CASE WHEN score>=90 THEN'Excellent'WHEN score>=80 THEN 'Good'ELSE'Poor'END

Performance optimization


Use appropriate functions

-- ✅ GOOD: Efficient for large datasets
SELECT
DATE_TRUNC('month', order_date) AS order_month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date);

-- ❌ AVOID: Less efficient
SELECT
SUBSTRING(order_date::text, 1, 7) AS order_month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY SUBSTRING(order_date::text, 1, 7);

Minimize function calls in WHERE clauses

-- ✅ GOOD: Index-friendly
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';

-- ❌ AVOID: Prevents index usage
SELECT * FROM orders
WHERE DATE_PART('year', order_date) = 2024;

Null handling


Always plan for null values

-- ✅ GOOD: Explicit NULL handling
SELECT
customer_name,
COALESCE(phone_number, 'No phone provided') AS contact_phone,
COALESCE(email, 'No email provided') AS contact_email
FROM customers;

-- ✅ GOOD: Safe division
SELECT
product_name,
CASE
WHEN total_orders > 0
THEN ROUND(total_revenue / total_orders, 2)
ELSE 0
END AS average_order_value
FROM product_summary;

COUNT function considerations

-- ✅ GOOD: Understand the difference
SELECT
COUNT(*) AS total_rows, -- Counts all rows
COUNT(email) AS customers_with_email, -- Counts non-NULL emails
COUNT(DISTINCT city) AS unique_cities -- Counts unique non-NULL cities
FROM customers;

Data type awareness


Consistent data types

-- ✅ GOOD: Proper type handling
SELECT
order_id,
ROUND(total_amount, 2) AS formatted_total, -- Keep as numeric
TO_CHAR(order_date, 'YYYY-MM-DD') AS formatted_date -- Convert to text when needed
FROM orders;

-- ❌ AVOID: Unnecessary type conversions
SELECT
order_id::text, -- Unnecessary conversion
total_amount::text -- Loses numeric properties
FROM orders;

Date handling

-- ✅ GOOD: Timezone aware
SELECT
order_date,
DATE_TRUNC('day', order_date AT TIME ZONE 'UTC') AS order_day_utc
FROM orders;

-- ✅ GOOD: Clear date arithmetic
SELECT
customer_id,
registration_date,
AGE(CURRENT_DATE, registration_date) AS customer_tenure_days
FROM customers;

Conditional logic best practices


Clear CASE statements

-- ✅ GOOD: Comprehensive and readable
SELECT
employee_id,
salary,
CASE
WHEN salary >= 100000 THEN 'Senior'
WHEN salary >= 75000 THEN 'Mid-Level'
WHEN salary >= 50000 THEN 'Junior'
WHEN salary >= 30000 THEN 'Entry-Level'
ELSE 'Intern'
END AS salary_band,
CASE
WHEN department_id IS NULL THEN 'Unassigned'
ELSE department_name
END AS department_display
FROM employees;

Nested functions

-- ✅ GOOD: Logical nesting with comments
SELECT
customer_id,
-- Clean and format customer names
INITCAP(
TRIM(
COALESCE(customer_name, 'Unknown Customer')
)
) AS formatted_customer_name
FROM customers;

Aggregation best practices


Meaningful grouping

-- ✅ GOOD: Clear grouping logic
SELECT
DATE_TRUNC('quarter', order_date) AS quarter,
product_category,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(order_id) AS total_orders,
SUM(order_total) AS total_revenue,
AVG(order_total) AS average_order_value,
STDDEV(order_total) AS revenue_volatility
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE order_date >= '2024-01-01'
GROUP BY
DATE_TRUNC('quarter', order_date),
product_category
HAVING COUNT(order_id) >= 10 -- Only include categories with significant volume
ORDER BY quarter DESC, total_revenue DESC;

Documentation & comments


Self-documenting queries

-- ✅ GOOD: Clear purpose and logic
-- Customer Lifetime Value Analysis
-- Calculates CLV for customers active in the last 2 years
SELECT
c.customer_id,
c.customer_name,
-- Customer tenure in days
AGE(CURRENT_DATE, c.registration_date) AS customer_age_days,
-- Purchase behavior metrics
COUNT(o.order_id) AS total_orders,
SUM(o.order_total) AS lifetime_revenue,
AVG(o.order_total) AS average_order_value,
-- Recency calculation
AGE(CURRENT_DATE, MAX(o.order_date)) AS days_since last order,
-- Customer lifetime value estimate
CASE
WHEN COUNT(o.order_id) > 0 THEN
ROUND(
SUM(o.order_total) /
NULLIF(AGE(CURRENT_DATE, c.registration_date), 0) * 365,
2
)
ELSE 0
END AS estimated_annual_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= CURRENT_DATE - INTERVAL '2 years'
GROUP BY c.customer_id, c.customer_name, c.registration_date
HAVING COUNT(o.order_id) > 0 -- Only customers with purchases
ORDER BY estimated_annual_value DESC;

Error prevention


Defensive programming

-- ✅ GOOD: Prevent division by zero and handle edge cases
SELECT
product_id,
product_name,
total_inventory,
total_sold,
-- Safe percentage calculation
CASE
WHEN total_inventory > 0 THEN
ROUND((total_sold::decimal / total_inventory) * 100, 2)
ELSE 0
END AS sell_through_percentage,
-- Safe average calculation
CASE
WHEN order_count > 0 THEN
ROUND(total_revenue / order_count, 2)
ELSE 0
END AS average order value
FROM product_metrics;

Input validation

-- ✅ GOOD: Validate and clean inputs
SELECT
customer_id,
-- Clean phone numbers
REGEXP_REPLACE(
REGEXP_REPLACE(phone, '[^0-9]', '', 'g'), -- Remove non-digits
'^1?(.{10})$',
'\1'
) AS cleaned_phone,
-- Validate email format
CASE
WHEN email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
THEN LOWER(TRIM(email))
ELSE NULL
END AS validated_email
FROM customers;

Testing & validation


Data quality checks

-- ✅ GOOD: Include data validation in your queries
SELECT
'Customer Data Quality' AS report_type,
COUNT(*) AS total_customers,
COUNT(CASE WHEN email IS NOT NULL THEN 1 END) AS customers_with_email,
COUNT(CASE WHEN phone IS NOT NULL THEN 1 END) AS customers_with_phone,
COUNT(CASE WHEN registration_date > CURRENT_DATE THEN 1 END) AS future registration dates,
AVG(CASE WHEN customer_name IS NOT NULL THEN LENGTH(customer_name) END) AS avg_name_length
FROM customers;

Common anti-patterns to avoid


What not to do

-- ❌ AVOID: Using functions unnecessarily
WHERE UPPER(status) = 'ACTIVE' -- Instead create index on status and use = 'ACTIVE'

-- ❌ AVOID: Complex nested subqueries when JOINs work better
SELECT * FROM customers WHERE customer_id IN (
SELECT customer_id FROM orders WHERE order_date IN (
SELECT MAX(order_date) FROM orders GROUP BY customer_id
)
);

-- ❌ AVOID: Not handling NULLs in aggregations
AVG(rating) -- Might not behave as expected with NULLs

-- ❌ AVOID: Hardcoded values without context
WHERE status_code = 3 -- Use meaningful names: WHERE status = 'ACTIVE'

Summary checklist

Before deploying your query, ensure to check the following:

  • Column names are clear and descriptive

  • Functions have meaningful aliases

  • Code is properly formatted and indented

  • NULL values are explicitly handled

  • Data types are appropriate and consistent

  • Performance implications are considered

  • Edge cases (division by zero, empty sets) are handled

  • Comments explain complex logic

  • Query has been tested with sample data

  • Results are validated for accuracy

Performance review

Review the following before deploying the query:

  • WHERE clauses don't prevent index usage

  • Appropriate functions are used for data types

  • Aggregations are efficient

  • Complex calculations are broken into readable steps

  • Query execution plan has been reviewed