To answer your question,
- $S$ in shannon entropy represents a discrete random variable with values $s_{1},s_{2},..s_{n}$
- $S$ in Information Gain represents set of training examples, in the form ${\displaystyle ({\textbf {s}},t)=(s_{1},s_{2},s_{3},...,s_{k},t)}$, where ${\displaystyle s_{a}\in vals(a)}$ is the value of the ${\displaystyle a^{\text{th}}}$ attribute or feature of example ${\displaystyle {\textbf {s}}}$ and $t$ is the class label.
Below is information from wikipedia
Shannon Entropy: wiki link
Given a discrete random variable $X$, with possible outcomes $x_{1} ,x_{2} ,....x_{n}$
, which occur with probability ${\displaystyle \mathrm {P} (x_{1}),...,\mathrm {P} (x_{n}),}{\displaystyle \mathrm {P} (x_{1}),...,\mathrm {P} (x_{n}),}$ the entropy of $X$ is formally defined as:
${\displaystyle \mathrm {H} (X)=-\sum _{i=1}^{n}{\mathrm {P} (x_{i})\log \mathrm {P} (x_{i})}}$
Information Gain:wiki link
Let ${\displaystyle T}$ denote a set of training examples, each of the form ${\displaystyle ({\textbf {x}},y)=(x_{1},x_{2},x_{3},...,x_{k},y)}$ where ${\displaystyle x_{a}\in vals(a)}$ is the value of the ${\displaystyle a^{\text{th}}}$ attribute or feature of example ${\displaystyle {\textbf {x}}}$ and $y$ is the corresponding class label. The information gain for an attribute ${\displaystyle a}$ is defined in terms of Shannon entropy ${\displaystyle \mathrm {H} (-)}$ as follows. For a value ${\displaystyle v}$ taken by attribute ${\displaystyle a}$, let
${\displaystyle S_{a}{(v)}=\{{\textbf {x}}\in T|x_{a}=v\}}$
be defined as the set of training inputs of ${\displaystyle T}$ for which attribute ${\displaystyle a}$ is equal to ${\displaystyle v}$. Then the information gain of ${\displaystyle T}$ for attribute ${\displaystyle a}$ is the difference between the a priori Shannon entropy ${\displaystyle \mathrm {H} (T)}$ of the training set and the conditional entropy ${\displaystyle \mathrm {H} (T|a)}$ .
${\displaystyle \mathrm {H} (T|a)=\sum _{v\in vals(a)}{{\frac {|S_{a}{(v)}|}{|T|}}\cdot \mathrm {H} \left(S_{a}{\left(v\right)}\right)}.}$
${\displaystyle IG(T,a)=\mathrm {H} (T)-\mathrm {H} (T|a)}$