JDBC formula reference
Overview
This document provides a comprehensive reference for JDBC formulas supported in DataGOL. These formulas enable powerful data processing, analysis, and transformation capabilities directly within your DataGOL workflows.
What you can achieve?
-
Data aggregation: Calculate sums, averages, counts, and statistical measures
-
Text processing: Clean, format, and manipulate string data
-
Date/Time operations: Parse, format, and calculate with dates and timestamps
-
Conditional logic: Implement business rules and data validation
-
Mathematical operations: Perform calculations and statistical analysis
Mathematical functions
ABS - Absolute Value
Purpose:
Removes negative signs from numbers.
Usage:
ABS(number)
Example:
returns 25
ABS(-25)
Use Case:
Calculating distances, finding differences without caring about direction, cleaning up negative values in reports.
Mathematical functions
CEIL - Round Up
Purpose:
Always rounds numbers up to the next whole number.
Usage:
CEIL(number)
Example:
Returns 5
CEIL(4.1)
Use Case:
Used for calculating minimum packages needed, determining required resources, and inventory planning.
FLOOR - Round Down
Purpose:
Always rounds numbers down to the previous whole number.
Usage:
FLOOR(number)
Example:
Returns 4
FLOOR(4.9)
Use Case:
Useful for calculating completed units, determining available capacity, and budget calculations.
ROUND - Smart Rounding
Purpose:
Rounds numbers to the nearest whole number or specified decimal places.
Usage:
ROUND(number, decimal_places)
Example:
Returns 123.46
ROUND(123.456, 2)
Use Case:
Applied in financial calculations, displaying prices, and creating clean reports.
POWER - Exponents
Purpose:
Multiplies a number by itself a specified number of times.
Usage:
POWER(base, exponent)
Example:
Returns 8 (2×2×2)
POWER(2, 3)
Use Case:
Used for compound interest calculations, growth projections, and area/volume calculations.
SQRT - Square Root
Purpose:
Finds what number multiplied by itself equals the given number.
Usage:
SQRT(number)
Example:
Returns 4
SQRT(16)
Use Case:
Applied in statistical calculations, geometric measurements, and risk analysis.
LN - Natural Logarithm
Purpose:
Mathematical function for advanced calculations and modeling.
Usage:
LN(number)
Example:
Returns 2.302585
LN(10)
Use Case:
Used in scientific calculations, decay models, and financial modeling.
PI - Pi Constant
Purpose:
Returns the mathematical constant $\pi$ (3.14159...).
Usage:
PI()
Example:
Returns 3.14159
PI()
Use Case:
Used in circle calculations, geometric formulas, and engineering calculations.
MOD - Remainder
Purpose:
Shows what's left over after division.
Usage:
MOD(dividend, divisor)
Example:
Returns 1 (10÷3 = 3 remainder 1)
MOD(10, 3)
Use Case:
Useful for creating alternating patterns, pagination, and scheduling rotations.
Statistical functions
AVG - Average
Purpose:
Calculates the mean value of a group of numbers.
Usage:
AVG(column_name)
Example:
Might return 50000
AVG(salary)
Use Case:
Used for performance metrics, salary analysis, quality scores, and customer ratings.
SUM - Total
Purpose:
Adds up all numbers in a group.
Usage:
SUM(column_name)
Example:
Might return 125000
SUM(sales)
Use Case:
Used for revenue totals, expense calculations, inventory counts, and performance totals.
COUNT - Count Items
Purpose:
Counts how many items exist (ignores empty values).
Usage:
COUNT(column_name)
Example:
Might return 150
COUNT(customer_id)
Use Case:
Used for customer counts, order quantities, active users, and completion rates.
MIN - Smallest Value
Purpose:
Finds the lowest number in a group.
Usage:
MIN(column_name)
Example:
Might return 9.99
MIN(price)
Use Case:
Used for lowest prices, minimum scores, earliest dates, and budget floors.
MAX - Largest Value
Purpose:
Finds the highest number in a group.
Usage:
MAX(column_name)
Example:
Might return 98
MAX(score)
Use Case:
Used for peak performance, highest prices, latest dates, and record highs.
MEDIAN - Middle Value
Purpose:
Finds the middle number when all values are arranged in order.
Usage:
MEDIAN(column_name)
Example:
Might return 45000
MEDIAN(income)
Use Case:
Used for salary benchmarks, performance baselines, and market analysis.
STDDEV - Standard Deviation
Purpose:
Measures how spread out numbers are from the average.
Usage:
STDDEV(column_name)
Example:
Might return 2.5
STDDEV(response_time)
Use Case:
Used for quality control, performance consistency, and risk assessment.
VARIANCE - Variance
Purpose:
Measures how much numbers vary from the average (squared differences).
Usage:
VARIANCE(column_name)
Example:
Might return 1250000
VARIANCE(sales)
Use Case:
Used for financial analysis, quality metrics, and forecasting accuracy.
Text functions
CONCAT - Join Text
Purpose:
Combines multiple pieces of text into one.
Usage:
CONCAT(text1, text2, text3...)
Example:
Returns "Hello World"
CONCAT('Hello', ' ', 'World')
Use Case:
Used for creating full names, building addresses, generating reports, and email formatting.
UPPER - Make Uppercase
Purpose:
Converts all letters to capital letters.
Usage:
UPPER(text)
Example:
Returns "HELLO"
UPPER('hello')
Use Case:
Used for standardizing data entry, creating consistent formats, and report headers.
LOWER - Make Lowercase
Purpose:
Converts all letters to small letters.
Usage:
LOWER(text)
Example:
Returns "hello"
LOWER('HELLO')
Use Case:
Used for email addresses, usernames, data cleaning, and search functions.
INITCAP - Capitalize Words
Purpose:
Makes the first letter of each word uppercase.
Usage:
INITCAP(text)
Example:
Returns "John Smith"
INITCAP('john smith')
Use Case:
Used for proper names, titles, address formatting, and professional correspondence.
TRIM - Remove Extra Spaces
Purpose:
Removes spaces from the beginning and end of text.
Usage:
TRIM(text)
Example:
Returns "Hello World"
TRIM(' Hello World ')
Use Case:
Used for data cleaning, form processing, and import/export cleanup.
LTRIM - Remove Left Spaces
Purpose:
Removes spaces only from the beginning of the text.
Usage:
LTRIM(text)
Example:
Returns "Hello"
LTRIM(' Hello')
Use Case:
Used for data formatting, code cleanup, and text alignment.
SUBSTRING - Extract Text Portion
Purpose:
Takes out a specific part of the text.
Usage:
SUBSTRING(text, start_position, length)
Example:
Returns "atab"
SUBSTRING('Database', 2, 4)
Use Case:
Used for extracting codes, parsing data, creating abbreviations, and ID processing.
RIGHT - Get End Characters
Purpose:
Takes a specified number of characters from the end of the text.
Usage:
RIGHT(text, number_of_characters)
Example:
Returns "base"
RIGHT('Database', 4)
Use Case:
Used for file extensions, postal codes, ID suffixes, and validation codes.
REGEXP_REPLACE - Pattern Replacement
Purpose:
Finds and replaces text patterns using advanced matching rules.
Usage:
REGEXP_REPLACE(text, pattern, replacement, flags)
Example:
Returns 1234567890
REGEXP_REPLACE('123-456-7890', '-', '', 'g')
Use Case:
Used for phone number formatting, data cleaning, removing special characters, and standardizing formats.
Date functions
CURRENT_DATE - Today's Date
Purpose:
Returns today's date.
Usage:
CURRENT_DATE
Example:
Returns 2025-01-22
CURRENT_DATE
Use Case:
Used for timestamping records, calculating ages, due date calculations, and report dating.
CURRENT_TIMESTAMP - Current Date and Time
Purpose:
Returns the exact current date and time.
Usage:
CURRENT_TIMESTAMP
Example:
Returns 2025-01-22 10:15:30
CURRENT_TIMESTAMP
Use Case:
Used for transaction logging, audit trails, real-time reporting, and activity tracking.
AGE - Calculate Time Difference
Purpose:
Calculates the number of days between two dates.
Usage:
AGE(end_date, start_date)
Example:
Returns 7
AGE('2025-01-22', '2025-01-15')
Use Case:
Used for customer tenure, project duration, payment delays, and service periods.
DATE_PART - Extract Date Component
Purpose:
Pulls out specific parts of a date (year, month, day, etc.).
Usage:
DATE_PART('part', date)
Example:
Returns 2025
DATE_PART('year', '2025-01-22')
Use Case:
Used for monthly reports, yearly summaries, seasonal analysis, and age calculations.
DATE_TRUNC - Round Date
Purpose:
Rounds dates down to a specific time period.
Usage:
DATE_TRUNC('precision', date)
Example:
Returns 2025-01-01
DATE_TRUNC('month', '2025-01-22')
Use Case:
Used for monthly grouping, quarterly reports, trend analysis, and data aggregation.
Utility functions
COALESCE - Handle Missing Data
Purpose:
Returns the first non-empty value from a list.
Usage:
COALESCE(value1, value2, value3...)
Example:
Returns "Fallback"
COALESCE(NULL, 'Fallback', 'Default')
Use Case:
Used for default values, handling missing data, report formatting, and data validation.
CASE WHEN - Conditional Logic
Purpose:
Creates an if-then-else logic for different scenarios.
Usage:
CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ELSE default_result END
Example:
Assigns 'A' if score >= 90, 'B' if score >= 80, else 'C'
CASE WHEN score >= 90 THEN 'A' WHEN score >= 80 THEN 'B' ELSE 'C' END
Use Case:
Used for grade calculations, status assignments, categorization, conditional formatting, and business rules.
Best practices
Naming conventions
Column and table names
-- ✅ GOOD: Clear, descriptive names
SELECT customer_first_name, order_total_amount
FROM customer_orders
-- ❌ AVOID: Cryptic abbreviations
SELECT cust_fn, ord_tot_amt
FROM cust_ord
Formula aliases
-- ✅ GOOD: Meaningful aliases
SELECT
CONCAT(first_name, ' ', last_name) AS full_name,
AVG(salary) AS average_salary,
COUNT(*) AS total_employees
-- ❌ AVOID: Generic or no aliases
SELECT
CONCAT(first_name, ' ', last_name),
AVG(salary) AS avg1,
COUNT(*) AS cnt
Code formatting
Readable structure
-- ✅ GOOD: Well-formatted and indented
SELECT
department_name,
COUNT(employee_id) AS employee_count,
AVG(salary) AS average_salary,
MAX(hire_date) AS latest_hire_date
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE hire_date >= '2020-01-01'
GROUP BY department_name
ORDER BY average_salary DESC;
-- ❌ AVOID: Everything on one line
SELECT department_name,COUNT(employee_id),AVG(salary),MAX(hire_date) FROM employees e JOIN departments d ON e.department_id = d.department_id WHERE hire_date >= '2020-01-01' GROUP BY department_name ORDER BY AVG(salary) DESC;
Function spacing
-- ✅ GOOD: Consistent spacing
CASE
WHEN score >= 90 THEN 'Excellent'
WHEN score >= 80 THEN 'Good'
WHEN score >= 70 THEN 'Average'
ELSE 'Needs Improvement'
END AS performance_rating
-- ❌ AVOID: Inconsistent spacing
CASE WHEN score>=90 THEN'Excellent'WHEN score>=80 THEN 'Good'ELSE'Poor'END
Performance optimization
Use appropriate functions
-- ✅ GOOD: Efficient for large datasets
SELECT
DATE_TRUNC('month', order_date) AS order_month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date);
-- ❌ AVOID: Less efficient
SELECT
SUBSTRING(order_date::text, 1, 7) AS order_month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY SUBSTRING(order_date::text, 1, 7);
Minimize function calls in WHERE clauses
-- ✅ GOOD: Index-friendly
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';
-- ❌ AVOID: Prevents index usage
SELECT * FROM orders
WHERE DATE_PART('year', order_date) = 2024;
Null handling
Always plan for null values
-- ✅ GOOD: Explicit NULL handling
SELECT
customer_name,
COALESCE(phone_number, 'No phone provided') AS contact_phone,
COALESCE(email, 'No email provided') AS contact_email
FROM customers;
-- ✅ GOOD: Safe division
SELECT
product_name,
CASE
WHEN total_orders > 0
THEN ROUND(total_revenue / total_orders, 2)
ELSE 0
END AS average_order_value
FROM product_summary;
COUNT function considerations
-- ✅ GOOD: Understand the difference
SELECT
COUNT(*) AS total_rows, -- Counts all rows
COUNT(email) AS customers_with_email, -- Counts non-NULL emails
COUNT(DISTINCT city) AS unique_cities -- Counts unique non-NULL cities
FROM customers;
Data type awareness
Consistent data types
-- ✅ GOOD: Proper type handling
SELECT
order_id,
ROUND(total_amount, 2) AS formatted_total, -- Keep as numeric
TO_CHAR(order_date, 'YYYY-MM-DD') AS formatted_date -- Convert to text when needed
FROM orders;
-- ❌ AVOID: Unnecessary type conversions
SELECT
order_id::text, -- Unnecessary conversion
total_amount::text -- Loses numeric properties
FROM orders;
Date handling
-- ✅ GOOD: Timezone aware
SELECT
order_date,
DATE_TRUNC('day', order_date AT TIME ZONE 'UTC') AS order_day_utc
FROM orders;
-- ✅ GOOD: Clear date arithmetic
SELECT
customer_id,
registration_date,
AGE(CURRENT_DATE, registration_date) AS customer_tenure_days
FROM customers;
Conditional logic best practices
Clear CASE statements
-- ✅ GOOD: Comprehensive and readable
SELECT
employee_id,
salary,
CASE
WHEN salary >= 100000 THEN 'Senior'
WHEN salary >= 75000 THEN 'Mid-Level'
WHEN salary >= 50000 THEN 'Junior'
WHEN salary >= 30000 THEN 'Entry-Level'
ELSE 'Intern'
END AS salary_band,
CASE
WHEN department_id IS NULL THEN 'Unassigned'
ELSE department_name
END AS department_display
FROM employees;
Nested functions
-- ✅ GOOD: Logical nesting with comments
SELECT
customer_id,
-- Clean and format customer names
INITCAP(
TRIM(
COALESCE(customer_name, 'Unknown Customer')
)
) AS formatted_customer_name
FROM customers;
Aggregation best practices
Meaningful grouping
-- ✅ GOOD: Clear grouping logic
SELECT
DATE_TRUNC('quarter', order_date) AS quarter,
product_category,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(order_id) AS total_orders,
SUM(order_total) AS total_revenue,
AVG(order_total) AS average_order_value,
STDDEV(order_total) AS revenue_volatility
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE order_date >= '2024-01-01'
GROUP BY
DATE_TRUNC('quarter', order_date),
product_category
HAVING COUNT(order_id) >= 10 -- Only include categories with significant volume
ORDER BY quarter DESC, total_revenue DESC;
Documentation & comments
Self-documenting queries
-- ✅ GOOD: Clear purpose and logic
-- Customer Lifetime Value Analysis
-- Calculates CLV for customers active in the last 2 years
SELECT
c.customer_id,
c.customer_name,
-- Customer tenure in days
AGE(CURRENT_DATE, c.registration_date) AS customer_age_days,
-- Purchase behavior metrics
COUNT(o.order_id) AS total_orders,
SUM(o.order_total) AS lifetime_revenue,
AVG(o.order_total) AS average_order_value,
-- Recency calculation
AGE(CURRENT_DATE, MAX(o.order_date)) AS days_since last order,
-- Customer lifetime value estimate
CASE
WHEN COUNT(o.order_id) > 0 THEN
ROUND(
SUM(o.order_total) /
NULLIF(AGE(CURRENT_DATE, c.registration_date), 0) * 365,
2
)
ELSE 0
END AS estimated_annual_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= CURRENT_DATE - INTERVAL '2 years'
GROUP BY c.customer_id, c.customer_name, c.registration_date
HAVING COUNT(o.order_id) > 0 -- Only customers with purchases
ORDER BY estimated_annual_value DESC;
Error prevention
Defensive programming
-- ✅ GOOD: Prevent division by zero and handle edge cases
SELECT
product_id,
product_name,
total_inventory,
total_sold,
-- Safe percentage calculation
CASE
WHEN total_inventory > 0 THEN
ROUND((total_sold::decimal / total_inventory) * 100, 2)
ELSE 0
END AS sell_through_percentage,
-- Safe average calculation
CASE
WHEN order_count > 0 THEN
ROUND(total_revenue / order_count, 2)
ELSE 0
END AS average order value
FROM product_metrics;
Input validation
-- ✅ GOOD: Validate and clean inputs
SELECT
customer_id,
-- Clean phone numbers
REGEXP_REPLACE(
REGEXP_REPLACE(phone, '[^0-9]', '', 'g'), -- Remove non-digits
'^1?(.{10})$',
'\1'
) AS cleaned_phone,
-- Validate email format
CASE
WHEN email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
THEN LOWER(TRIM(email))
ELSE NULL
END AS validated_email
FROM customers;
Testing & validation
Data quality checks
-- ✅ GOOD: Include data validation in your queries
SELECT
'Customer Data Quality' AS report_type,
COUNT(*) AS total_customers,
COUNT(CASE WHEN email IS NOT NULL THEN 1 END) AS customers_with_email,
COUNT(CASE WHEN phone IS NOT NULL THEN 1 END) AS customers_with_phone,
COUNT(CASE WHEN registration_date > CURRENT_DATE THEN 1 END) AS future registration dates,
AVG(CASE WHEN customer_name IS NOT NULL THEN LENGTH(customer_name) END) AS avg_name_length
FROM customers;
Common anti-patterns to avoid
What not to do
-- ❌ AVOID: Using functions unnecessarily
WHERE UPPER(status) = 'ACTIVE' -- Instead create index on status and use = 'ACTIVE'
-- ❌ AVOID: Complex nested subqueries when JOINs work better
SELECT * FROM customers WHERE customer_id IN (
SELECT customer_id FROM orders WHERE order_date IN (
SELECT MAX(order_date) FROM orders GROUP BY customer_id
)
);
-- ❌ AVOID: Not handling NULLs in aggregations
AVG(rating) -- Might not behave as expected with NULLs
-- ❌ AVOID: Hardcoded values without context
WHERE status_code = 3 -- Use meaningful names: WHERE status = 'ACTIVE'
Summary checklist
Before deploying your query, ensure to check the following:
-
Column names are clear and descriptive
-
Functions have meaningful aliases
-
Code is properly formatted and indented
-
NULL values are explicitly handled
-
Data types are appropriate and consistent
-
Performance implications are considered
-
Edge cases (division by zero, empty sets) are handled
-
Comments explain complex logic
-
Query has been tested with sample data
-
Results are validated for accuracy
Performance review
Review the following before deploying the query:
-
WHERE
clauses don't prevent index usage -
Appropriate functions are used for data types
-
Aggregations are efficient
-
Complex calculations are broken into readable steps
-
Query execution plan has been reviewed