You are correct. They are not independent and they are positively correlated.
Let $A$ and $B$ (or $X$ and $Y$) be two events for stating general theorems and $M$ and $T$ be the events "customer purchases mozzarella" and "customer purchases tomatoes" in this specific example. We will use $\wedge$ to mean "and" so that $M \wedge T$ is the event "customer purchases mozzarella and tomatoes".
The simplest way to check independence is directly from the definition. $A$ and $B$ are independent if $P(A \wedge B) = P(A)P(B)$. From the data table we have
$$P(M \wedge T) = 2000 / 5000 = 0,4$$
$$P(M) = 2500 / 5000 = 0,5$$
$$P(T) = 3000 / 5000 = 0,6$$
$$P(M)P(T) = 0,5 \times 0,6 = 0,3$$
Since 0.4 does not equal 0.3 we conclude that the events are not independent.
I have to say that I've never seen the term "lift" before. But its definition is perfectly reasonable and we could calculate independence in those terms. We can rearrange the definition of independence to give
$$P(A \wedge B) = P(A)P(B)$$
$$\frac{P(A \wedge B)}{P(A)P(B)} = 1$$
$$\text{lift} = 1$$
to see that two events are independent if the associated lift is exactly 1. Your notes correctly show that lift can be rewritten as
$$\frac{\frac{P(A \wedge B)}{P(A)}}{P(B)}$$
Here we have a lift of
$$0,4 / (0,5 \times 0,6) = 1,33...$$
or
$$0,8 / 0,6 = 1,33...$$
as you correctly calculated in your notes. Since the lift is not exactly 1 we conclude that the two events are not independent.
I would have approached the question of correlation via conditional probabilities, as follows. Two events, $A$ and $B$ are positively correlated if $P(A | B) > P(A)$. That is to say, two events $A$ and $B$ are correlated if the probability of $A$ given $B$ is greater than the unconditional probability of $A$. In other words, if we discover that event $B$ has occurred then the chances that event $A$ has occurred increase.
The conditional probability is defined as $$P(A | B) = \frac{P(A \wedge B)}{P(B)}$$ So in this example $P(M | T) = P(M \wedge T) / P(T) = 0,4 / 0,6 = 0,66...$ whereas $P(M) = 0,5$, so we conclude that $M$ and $T$ are positively correlated.
But hang on! Surely correlation is supposed to symmetric. So let's test whether $P(T | M) > P(M)$. We have $P(T | M) = 0,4 / 0,5 = 0,8$ and $P(T) = 0,6$ so, yes, they are correlated.
A lucky escape? We can be more smug than that. Let's rewrite our definition of correlation
$$P(M | T) > P(M)$$
$$\frac{P(M | T)}{P(M)} > 1$$
$$\frac{\frac{P(M \wedge T)}{P(T)}}{P(M)} > 1$$
$$\frac{P(M \wedge T)}{P(T)P(M)} > 1$$
If we approach correlation the other way we get the same result
$$P(T | M) > P(T)$$
$$\frac{P(T | M)}{P(T)} > 1$$
$$\frac{\frac{P(T \wedge M)}{P(M)}}{P(T)} > 1$$
$$\frac{P(T \wedge M)}{P(M)P(T)} > 1$$
which is the same as above because both the and operator and multiplication are commutative.
As I'm sure you've noticed, $$\frac{P(M \wedge T)}{P(T)P(M)} > 1$$ is simply asking whether lift is above 1. That's probably the approach that your instructor was expecting you to take.
A couple of additional points to fully close the loop with your notes. You correctly calculated lift(Moz => NoTom) = 0,5
but that information was not required to answer the question because you had already calculated lift(Moz => Tom)
. Support({X} -> {Y})
looks very much like a definition of $P(X \wedge Y)$ and Confidence({X} -> {Y})
looks like $P(Y | X)$.
I have two final notes to place this all in a broader context.
First, my definition of correlation fits well to your definition of correlation as "does buying one of them increase or decrease the probability of buying the other". But our definitions are not completely standard and the word "correlation" is usually associated with some specific correlation measure -- most often Pearson's correlation coefficient but also things like Kendall's rank correlation coefficient. Having said that, our definition is totally reasonable and I'm fairly confident that we could prove that, for example, Kendall's rank correlation coefficient is positive if and only if our definition of positive correlation above is positive.
Second, inspired by the definition of lift in probabilistic terms in your notes, I've conflated historical frequencies given in the table with probabilities. This is standard practice in much of data science and certainly in exercises but it is not completely uncontroversial. Probabilities are about the future and you have data about the past. The extent to which you can infer future probabilities from past data is philosophically unresolved. But that is not something to worry about in this situation.