Overdue evolution in SQL

Despite my efforts I cannot think of a way to answer my need :

I have 2 tables containing respectively a set of loans to be reimbursed, and a set of reimbursements on these loans (not all loans have an entry in reimbursement table because some of them remain unpaid).

Table loan has columns:

  • id
  • amount
  • due_date

Table reimbursement has columns:

  • id
  • debt_id
  • payment_date

My goal is to obtain the amount that was overdue for more than 5 days for each date in the past.

Here is where I am at (in Postgres grammar):

with due_for_5_days_by_date AS (
    select 
    due_date + interval '5' DAY as due_date,
    sum(amount) as amount
    FROM loan
    GROUP BY due_date
),
paid_by_date as (
    select 
        payment_date,
        sum(amount) as amount
    FROM reimbursement
    join loan on loan.id = loan_id
    GROUP BY payment_date
),
cumulated as (
    SELECT
        COALESCE(due_date, payment_date) as date,
        SUM(COALESCE(paid_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as paid,
        SUM(COALESCE(due_for_5_days_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as due_5d
    FROM due_for_5_days_by_date
    FULL OUTER JOIN paid_by_date on due_date = payment_date
)
select 
    DATE,
    due_5d - paid as overdue_5d
from cumulated
order by date

Explanation :

  • First I get the amount due for exactly 5 days for each date
  • Then, I get the amount paid for each date
  • Then I sum the amounts obtained at 1. and 2. to get the cumulative due and paid amounts
  • Then by difference I get the overdue amount

But the problem is that, by doing this, the amount paid by date also contains reimbursement of loans expired for less than 5 days. In other words the paid amount for each date in the final query is overestimated as it should exclude the amount received on loans recently expired on that date. I cannot think of a way to fix this.

Can you think of a way to obtain what I want ?

Thanks a lot for your help !

Topic google-bigquery sql

Category Data Science


I just found this way of doing it, but it uses a cross join, so not sure it will scale well on my whole data, even though it is hosted on BigQuery:

with all_dates as (
    select distinct payment_date as date
    from reimbursement
    union
    select distinct due_date as date
    from loan
    union
    select distinct due_date + interval '5' day as date
    from loan
), 
joined_data as (
    SELECT
        amount,
        due_date,
        payment_date
    from loan
    LEFT join reimbursement on loan.id = loan_id
)
select 
    date,
    SUM(
      CASE
        WHEN (due_date <= DATE - INTERVAL '5' day) and (payment_date > DATE or payment_date is null) then amount
        else 0
      END
    ) as overdue_5d
FROM all_dates
cross join joined_data
group by date
order by DATE

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.