Pitfalls

  • There is no == in SQL, use =
  • No trailing commas
  • No using of AND in ORDER BY and GROUP BY clauses- A JOIN is followed by a ON not a WHERE unless intentional
  • Numerical equality Conditions inside count are always counted as 1 (can be used inside AVG though)

Functions

  1. Length - of a VARCHAR field
SELECT
	tweet_id
FROM
	Tweets
WHERE
	LENGTH(content) > 15
;
  1. Count
SELECT
  T1.customer_id,
  count(T1.visit_id) as count_no_trans
FROM
  Visits as T1
  LEFT JOIN Transactions as T2
ON 
  T1.visit_id = T2.visit_id
WHERE
  T2.transaction_id IS NULL
GROUP BY
  T1.customer_id

You can use AND within the count function

  1. Datediff

Given a field recordDate of datatype date,

SELECT 
  T2.id as id
FROM
  Weather as T1
  JOIN Weather as T2
WHERE
  DATEDIFF(T2.recordDate, T1.recordDate) = 1
AND
  T1.temperature < T2.temperature
  1. avg
# Write your MySQL query statement below
SELECT 
  T1.machine_id,
  avg(T2.timestamp - T1.timestamp) AS processing_time
FROM
  Activity as T1
  JOIN Activity as T2
ON
  T1.machine_id = T2.machine_id 
  AND T1.process_id = T2.process_id
  AND T1.activity_type = 'start'
  AND T2.activity_type = 'end'
GROUP BY
  T1.machine_id
  
# Another example
 
SELECT
    ROUND(
        AVG(order_date=customer_pref_delivery_date) * 100
    , 2) as immediate_percentage
FROM
    Delivery
WHERE
    (customer_id, order_date) IN (SELECT customer_id, min(order_date) FROM Delivery GROUP BY customer_id)
  1. Between
SELECT
    T1.product_id as product_id,
    ROUND(SUM(T1.price * T2.units) / SUM(T2.units), 2) AS average_price
FROM
    Prices AS T1
    JOIN UnitsSold AS T2
    ON T1.product_id = T2.product_id
WHERE
    T2.purchase_date BETWEEN T1.start_date AND T1.end_date
GROUP BY
    T1.product_id
  1. CASE WHEN
    SELECT
     query_name,
     ROUND(
         SUM(rating / position) / COUNT(query_name)
     , 2) as quality,
      ROUND(
          SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) * 100 / COUNT(query_name)
     , 2) as poor_query_percentage

FROM Queries GROUP BY query_name


7. Date_format
   ```sql
   SELECT
    DATE_FORMAT(trans_date, "%Y-%m") as month,
    country,
    COUNT(*) AS trans_count,
    SUM(CASE WHEN state="approved" THEN 1 ELSE 0 END) as approved_count,
    SUM(amount) AS trans_total_amount,
    SUM(CASE WHEN state="approved" THEN amount ELSE 0 END) as approved_total_amount
FROM
    Transactions
GROUP BY
    country, DATE_FORMAT(trans_date, "%Y-%m")
;

Using WHERE vs ON

SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
SELECT * FROM table1 JOIN table2 WHERE table1.id = table2.id;

Both the above commands return the same result, but using ON is more sensible as it’s clear to the reader that the condition is used upon joining the two tables. WHERE is more suitable when we are trying to apply a filter on the fetched data.

Is there a performance difference though?

No. SQL comes up with a query plan for every user written query, in this case both queries result in the same query plan.

Example usage,

SELECT 
  T1.machine_id,
  round(avg(T2.timestamp - T1.timestamp), 3) AS processing_time
FROM
  Activity as T1
  JOIN Activity as T2
ON
  T1.machine_id = T2.machine_id 
  AND T1.process_id = T2.process_id
WHERE
  T1.activity_type = 'start'
  AND T2.activity_type = 'end'
GROUP BY
  T1.machine_id

Having

SELECT 
    T2.name as name
FROM
    Employee as T1
    JOIN Employee as T2
    ON T1.managerId = T2.id 
GROUP BY
    T1.managerId
HAVING 
    COUNT(T1.managerId) >= 5
 

Show of strength

  • Use <> instead of != (Both works though)
  • Format the query in multiple lines with indentation

Note JOIN is by default an INNER JOIN

Refs

  1. https://leetcode.com/studyplan/top-sql-50/