Jekyll2023-02-01T10:59:04-06:00https://zhongyuanzhao.com/feed.xmlZhongyuan Zhao, Ph.DPh.D. in Computer Engineering. CFA Level 3 Candidate.Zhongyuan Zhaozhongyuan.zhao@rice.eduhttp://zhongyuanzhao.comOxford Mathematics Lectures: “Networks (Renaud Lambiotte)”, “Riemannian Geometry (Jason Lotay)”2022-10-27T00:00:00-05:002022-10-27T00:00:00-05:00https://zhongyuanzhao.com/posts/2022/10/Oxford-Mathematics-Networks-Lambiotte<p>This is a good lecture on network science (<a href="https://www.youtube.com/watch?v=TQKgB0RnjeY&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS">YouTube playlist</a>) by <a href="https://www.maths.ox.ac.uk/people/renaud.lambiotte">Renaud Lambiotte</a> from <a href="https://www.youtube.com/c/OxfordMathematics">Oxford Mathematics (YouTube Channel)</a>.</p>
<p>Reference book:<br />
<a href="https://www.worldscientific.com/worldscibooks/10.1142/q0033#t=aboutBook">Lambiotte, R. and Masuda, N., 2021. A guide to temporal networks. World Scientific.</a></p>
<p>Course archive <a href="https://courses-archive.maths.ox.ac.uk/node/49460">C5.4 Networks</a>, materials download:</p>
<p>Course Synopsis:</p>
<ol>
<li><a href="https://www.youtube.com/watch?v=TQKgB0RnjeY&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS">Introduction and short overview of useful mathematical concepts (2 lectures): Networks as abstractions; Renewal processes; Random walks and diffusion; Power-law distributions; Matrix algebra; Markov chains; Branching processes.</a></li>
<li><a href="https://www.youtube.com/watch?v=zF5nVMG-Big&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS">Basic structural properties of networks (2 lectures): Definition; Degree distribution; Measures derived from walks and paths; Clustering coefficient; Centrality Measures; Spectral properties.</a></li>
<li><a href="https://www.youtube.com/watch?v=cq-rAUwQoaM&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS&index=3">Models of networks (2 lectures): Erdos-Rényi random graph; Configuration model; Network motifs; Growing network with preferential attachment.</a></li>
<li><a href="https://www.youtube.com/watch?v=W_A6NbqpTW8&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS&index=4">Community detection (2 lectures): Newman-Girvan Modularity; Spectral optimization of modularity; Greedy optimization of modularity.</a></li>
<li><a href="https://www.youtube.com/watch?v=7p9ImBpxlG8&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS&index=5">Dynamics, time-scales and Communities (2 lectures): Consensus dynamics; Timescale separation in dynamical systems and networks; Dynamically invariant subspaces and externally equitable partitions</a></li>
<li><a href="https://www.youtube.com/watch?v=cctHyGe5D_k&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS&index=6">Dynamics I: Random walks (2 lectures): Discrete-time random walks on networks; PageRank; Mean first-passage and recurrence times; Respondent-driven sampling; Continous-Time Random Walks</a></li>
<li><a href="https://www.youtube.com/watch?v=---gzhcMEHA&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS&index=7">Random walks to reveal network structure (2 lectures): Markov stability; Infomap; Walktrap; Core–periphery structure; Similarity measures and kernels</a></li>
<li><a href="https://www.youtube.com/watch?v=67xLOcA5Qbs&list=PL4d5ZtfQonW0MsGE4Pn12rxUprPXB4_VS&index=8">Dynamics II: Epidemic processes (2 lectures): Models of epidemic processes; Mean-Field Theories and Pair Approximations</a></li>
</ol>
<p>Another related lecture is on Riemannian geometry by Jason Lotay from Oxford Mathematics</p>
<p>Course archive <a href="https://courses-archive.maths.ox.ac.uk/node/51092">C3.11 Riemannian Geometry</a></p>
<p>The lectures are not
<a href="https://www.youtube.com/watch?v=wZgM3u8UkNs&list=PL4d5ZtfQonW17IBjdLKcfQVBuuKaWnxbx">https://www.youtube.com/watch?v=wZgM3u8UkNs&list=PL4d5ZtfQonW17IBjdLKcfQVBuuKaWnxbx</a></p>
<ol>
<li>Riemannian manifolds: basic examples of Riemannian metrics, Levi-Civita connection.</li>
<li>Geodesics: definition, first variation formula, exponential map, minimizing properties of geodesics.</li>
<li>Curvature: Riemann curvature tensor, sectional curvature, Ricci curvature, scalar curvature.</li>
<li>Riemannian submanifolds: examples, second fundamental form, Gauss–Codazzi equations.</li>
<li>Jacobi fields: Jacobi equation, conjugate points.</li>
<li>Completeness: Hopf–Rinow and Cartan–Hadamard theorems</li>
<li>Constant curvature: classification of complete manifolds with constant curvature.</li>
<li>Second variation and applications: second variation formula, Bonnet–Myers and Synge’s theorems.</li>
</ol>Zhongyuan Zhaozhongyuan.zhao@rice.eduhttp://zhongyuanzhao.comThis is a good lecture on network science (YouTube playlist) by Renaud Lambiotte from Oxford Mathematics (YouTube Channel).A Gentle Introduction to Distributed Link Scheduling in Self-organizing Wireless Networks: Part I2022-05-29T00:00:00-05:002022-05-29T00:00:00-05:00https://zhongyuanzhao.com/posts/2022/05/distributed-link-scheduling-for-wireless-multihop-networks-part1<p><strong>Author: <a href="https://zhongyuanzhao.com">Zhongyuan Zhao</a></strong></p>
<p>As you came across this article, chances are that you are interested in wireless networks and/or machine learning.
This post is about using machine learning to improve wireless networks, especially when both of them run in fully distributed manners. In the first part, I will introduce the basic idea of self-organizing networks so that you could better understand the prerequisites of this work, such as why distributed solutions are emphasized.</p>
<h2 id="1-what-is-self-organizing">1. What is self-organizing?</h2>
<figure>
<img src="https://i.imgur.com/zDyr3N4.jpg" alt="a busy cafe" style="width:600px" class="center" />
<figcaption align="center"><b>Fig.1 - As a social species, humans are self-organizing in public.</b> Source: <a href="https://upserve.com/restaurant-insider/keeping-restaurant-guests-happy-during-long-line-season1/">Upserve</a></figcaption>
</figure>
<p>Imagining you and your friends are sitting in a crowded coffee shop, chatting on your favorite topics, with music and other ongoing conversations in the background. As humans, we can automatically perform such tasks without even noticing about it. But if you think about it, in order to have a meaningful conversation with others, we need to at least speak the same language and follow the same social rules of conversation. The importance of language is obvious, but the rules of conversation that we follow implicitly are often overlooked.</p>
<p>Social norm is the way we self-organize as a social species. We can form and maintain orders in public without an organizer giving us instructions at each moment. For example, in a conversation, we follow cultural rules to signal the other party when we are about to start or end a sentence, as well as to control our volume in public spaces. Another example is our traffic system. As a driver, we have to follow traffic rules to avoid accidents, including signaling others before making turns or slowing down, controlling our speed, and monitoring the surroundings. Of course, we have to follow the signals of traffic lights and/or police at busy intersections. But in general, traffic system is mostly self-organized.</p>
<p>Wireless networks are analogous to the coffee shop scenario, where wireless devices talk to each other in the presence of background noise (music) and interferences (distractions from other conversations). <strong>Roughly speaking, wireless communications deal with the language and attention between two devices, whereas wireless networking is about the social norms or rules.</strong> However, wireless devices are dumb and can only act on what they are programmed to do. It may be too hard, too costly, or even unnecessary to program wireless devices to be self-organizing, especially in a busy and crowded environment.</p>
<h2 id="2-what-is-not-a-self-organizing-network">2. What is (not) a self-organizing network?</h2>
<p>Many engineering systems are organized top-down. For example, in railway systems and cellular networks, the physical and data traffics flow in pre-planned networks, following the instructions of dedicated schedulers. Without the infrastructure, the physical or data payloads can not move on their own. However, as long as the infrastructure is in place and functions properly, the entire system can be very efficient in resource utilization. Think about the average labor and energy costs of moving a passenger for 1 km by train and by cars.</p>
<p>The infrastructure of the 5th generation (5G) cellular networks is called cloud radio access network (Cloud-RAN), as illustrated in Fig. 2. Cloud-RAN comprises base-stations connected to the telecommunication network and the Internet through wired/wireless fronthaul and backhaul networks. Each base-station provides network connectivity to mobile devices in a macro or small cell (see Fig. 2). Cellular network operators carefully plan and layout these cells in a market based on factors like population density and terrain, to optimize the coverage, bandwidth, cost, and flexibility of their cellular networks. This process is called <em>network optimization</em>.</p>
<figure>
<img src="/images/5g_metis.jpeg" alt="infrastructure-based networks" style="width:600px" class="center" />
<figcaption align="center"><b>Fig.2 - Infrastructure-based network: 5G cloud radio access network (Cloud-RAN).</b></figcaption>
</figure>
<p>To maximize the resource utilization in networks, such as bandwidth and energy, a cellular base-station schedules the transmission of mobile devices attached to it, and manages their radio parameters as well.
To do so, cellular base-station is equipped with descent real-time computing power, dedicated backhaul network, and high performance radio transceivers.
To run a cellular network, it requires a huge upfront investment and years of construction, as well as continuous maintenance and upgrading.</p>
<p>Unlike the cellular network, Wi-Fi networks are organized bottom-up. Anyone can install a Wi-Fi router (access point) at home.
The medium access control (MAC, or media access control) of Wi-Fi is self-organized, meaning that a Wi-Fi access point does not schedule transmissions or manage the parameters of the mobile devices attached to it.
When a Wi-Fi device has data packets to transmit, it first listens to the medium (wireless channel) for a brief time, if there is no conversation in the background, it then starts transmission, otherwise it waits for a while and then tries again.
The self-organizing feature of Wi-Fi makes a Wi-Fi access point a 1000 time cheaper than a cellular base-station, and that’s why Wi-Fi is so successful. In fact, it is estimated that as of 2022, 51% of Internet traffic goes through Wi-Fi, while only 19.6% of that goes through cellular networks. [<a href="https://twiki.cern.ch/twiki/pub/HEPIX/TechwatchNetwork/HtwNetworkDocuments/white-paper-c11-741490.pdf">Cisco VNI report, 2017-2022</a>, Fig. 22]
Even cellular network operators install Wi-Fi access points at hot spots to offload traffic from cellular networks.</p>
<p>However, the listen-before-talk approach used in Wi-Fi MAC only works for small networks.
A Wi-Fi hot spot can be congested when there are lots of Wi-Fi devices around, not because their total demand of bandwidth exceed the capacity of the Wi-Fi access point, but too many collisions over-the-air.
You may have experienced such congestions at café, hotel, or library.</p>
<h2 id="3-why-self-organizing-wireless-networks">3. Why self-organizing wireless networks?</h2>
<p>In general, self-organizing wireless networks are more flexible but less efficient than planned infrastructure.
In many cases, having greater flexibility not only makes economic sense, but is also technically sound.
The traditional examples are <em>wireless ad-hoc networks</em> and <em>wireless sensor networks</em> operating in harsh or hostile environments, such as battlefield (Fig. 1), disaster area, and remote locations, where the network infrastructure is infeasible.</p>
<figure>
<img src="/images/adhoc_military.png" alt="military ad-hoc networks" style="width:600px" class="center" />
<figcaption align="center"><b>Fig.1 - Wireless multihop networks: mobile ad hoc networks in military communications.</b></figcaption>
</figure>
<p>In cellular and Wi-Fi networks, user devices are connected to wired infrastructure (Internet) via the base-station or access point in a single hop.
However, in wireless ad hoc networks, the sender and recipient of data packages are typically connected by other user devices over multiple hops. As a result, wireless ad hoc networks belong to <em>wireless multihop networks</em>, where ‘hop’ is only for wireless connections.</p>
<figure>
<img src="/images/WMN_applications.png" alt="vehicles, drone fleets, drone bs, CubeSat" style="width:100%" />
<figcaption align="center"><b>Fig.3 - Applications of wireless multihop networks: (top left) vehicular communications, (top right) drone-assisted communications, (bottom left) drone swarm (bottom right) CubeSat constellation</b></figcaption>
</figure>
<p>Wireless multihop networks are playing a bigger role in many emerging civilian applications, such as smart vehicles, drone fleets, the Internet of Things (IoT), and wireless backhaul networks for small cells, and drone and CubeSat-assisted wireless networks in 5G and beyond [Akyildiz 2022], as illustrated in Fig. 3.
For example, the exchange of safety-critical information (or control messages) among traveling vehicles (drones), as shown in upper left (bottom left) of Fig. 3, requires ultra-low latency and seamless network coverage.
To meet such requirements, it is better to establish a direct connection between the neighboring vehicles (drones) rather than placing a base-station in the middle.</p>
<p>In 5G and beyond, wireless multihop networks can also be the wireless backhaul network that connects the base-stations in small cells or carried by drones and CubeSats to the Internet (Fig. 3 right).
Small cells are for highly populated hotspots, such as central business districts, schools, and residential areas (see Fig. 2), whereas drone and CubeSats-assisted wireless networks are for temporary hotspots and non-terrestrial communications (such as ocean, airplane, mountains, desert).
For small cells, wireless backhaul can avoid the high cost and inconvenience of installing wired backhaul.
For drone and CubeSat-assisted networks, wireless backhaul is the only option. [Akyildiz 2022]</p>
<p>Wireless multihop networks might help address the challenge of massive access [Chen 2021] in Internet of Things (IoT).
In the future, the popularity of IoT devices could lead to very high connectivity density, e.g. 10 million wireless connections per $km^2$ by 2030 [Chen 2021], they need periodically send small payloads to its controller.
Although the real bandwidth required by these payloads is not high, the overheads of handshaking and authorization generated by these IoT devices under existing cellular architecture could consume all the bandwidth, jamming the entire network.
If these IoT devices can form self-organizing networks, the situation could be significantly improved.</p>
<h2 id="4-synchronization">4. Synchronization</h2>
<p>Besides infrastructure, another dimension of the organization in wireless networks is synchronization.
In general, both infrastructure and synchronization can improve the resource efficiency and performance, at the cost of reduced flexibility.
We can further categorize wireless networks based on these two dimensions, as shown in the following table.</p>
<table>
<thead>
<tr>
<th> </th>
<th>synchronized</th>
<th>random access</th>
</tr>
</thead>
<tbody>
<tr>
<td>Infrastructure</td>
<td>Cellular, 5G networks</td>
<td>Wi-Fi</td>
</tr>
<tr>
<td>Ad hoc</td>
<td>Vehicular/Flying/Tactical Ad-hoc Networks</td>
<td>Wireless Ad hoc / Sensor Networks</td>
</tr>
</tbody>
</table>
<p>In synchronized networks, wireless devices are synchronized to the same clock, such as GPS signal, which enables them to communicate on accurately defined time slots without colliding with each other.
Therefore, synchronized networks can achieve better performance in throughput, mobility, and resource efficiency.
In random access networks, there is an additional overhead to establish a link between unsynchronized devices, which
limits the performance and resource efficiency. However, without the need of synchronization, it is very easy for devices in random access networks to self-organize. As a result, random access networks can offer greater flexibility, lower energy consumption, and lower cost of device.</p>
<p>It should be noted that random access and synchronization are necessary for communications. In cellular protocols, there is a dedicated random access channel that allows new devices to join the network. In Wi-Fi and other random access networks, the receiver needs to synchronize itself to the transmitter in order to decode the data.</p>
<h2 id="5-summary">5. Summary</h2>
<p>In this part, I introduced the concept of self-organizing wireless networks, and its applications. The takeaway is that self-organizing wireless networks and infrastructure-based wireless networks have different priorities: the former emphasizes flexibility and reliability, the latter is primarily focused on performance and resource efficiency.
Therefore, the technical solutions for self-organizing wireless networks need to be infrastructureless, meaning that it should be implemented in a distributed manner.
In the next part, I will introduce some basic networking solutions in wireless multihop networks, and how to use machine learning to improve them.</p>
<h2 id="references">References</h2>
<ul>
<li>[Chen 2021] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir,and R. Schober, “Massive access for 5g and beyond,”IEEE Journal of Selected Areas in Communications, vol. 39, no. 3, pp. 615–637, 2021.</li>
<li>[Akyildiz 2022] I. F. Akyildiz, A. Kak, and S. Nie, “6G and beyond: The future of wireless communications systems,” IEEE Access, vol. 8,pp. 133995–134030, 2020.</li>
</ul>Zhongyuan Zhaozhongyuan.zhao@rice.eduhttp://zhongyuanzhao.comAuthor: Zhongyuan ZhaoApply Arbitrary Custom Gradient Through Squared Loss2022-03-18T00:00:00-05:002022-03-18T00:00:00-05:00https://zhongyuanzhao.com/posts/2022/03/apply-arbitrary-gradient-through-squared-loss<p>By <a href="https://zhongyuanzhao.com">Zhongyuan Zhao</a></p>
<h2 id="tldr">TL;DR</h2>
<p>In some machine learning (ML) problems, such as <a href="https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html">policy gradient</a> reinforcement learning algorithms, you may have a non-differentiable loss/objective function with regard to (w.r.t.) the final or intermediate output $\mathbf{y}$ of the downstream machine learning (ML) pipeline.
For example, a custom gradient, $\mathbf{\delta} = \nabla_{\mathbf{y}}l_{b}(\mathbf{y}, r)$, may depend on the feedback $r$ from the environment or some blackbox process, therefore can not be implemented via the automatic differentiation in <em>Tensorflow</em> and <em>PyTorch</em>.
A quick trick to apply artibrary gradient $\mathbf{\delta}$ in backpropagation is using squared loss:</p>
<p>
\begin{equation*}
l(\mathbf{y},\tilde{\mathbf{y}},\mathbf{\delta}) = \frac{1}{2} \lVert\mathbf{y} - (\tilde{\mathbf{y}} + \mathbf{\delta})\rVert_{2}^{2}
= \frac{1}{2} \sum_{k=1}^{n}\left[\mathbf{y}_{k} - (\tilde{\mathbf{y}}_{k} + \mathbf{\delta}_{k})\right]^2\;.
\end{equation*}
</p>
<p>If your code is developed in <em>Tensorflow 1</em> (with sessions and computational graph), this trick can save you lots of trouble and burden in migrating to <em>Tensorflow 2</em> for slower eager execution, or to <em>PyTorch</em> for customized autograd.</p>
<h2 id="a-primer-on-gradient-descent">A primer on gradient descent</h2>
<p>Given the input data $\mathbf{X}$ and label $\mathbf{y}^*$, the ML algorithm or artificial neural network (ANN) outputs prediction as $\mathbf{y}=f(\mathbf{X};\mathbb{\Theta})$.
Here, we denote a matrix by a bold upper case letter, such as $\mathbf{X}$, a vector by bold lower case letter, such as $\mathbf{y}$, and the $k$th element in vector $\mathbf{y}$ by subscript $k$ as $\mathbf{y}_{k}$.
The ML algorithm or ANN is represented as a paramterized function $f(\mathbf{X};\mathbb{\Theta})$, where $\mathbb{\Theta}$ is the set of parameters.
The training of the parameters is carried out by an optimizer, which iteratively update the parameters through gradient descent in the direction of minimizing a loss function.</p>
<p>The loss function of the prediction, the label, and/or the set of parameters, denoted as $l(\mathbf{y}, \mathbf{y}^*, \mathbb{\Theta})$, is the objective function to be minimized in training (optimization).
You probably have already learned several commonly used loss functions, such as <a href="https://en.wikipedia.org/wiki/Cross_entropy">cross entropy</a> for classfication, <a href="https://en.wikipedia.org/wiki/Mean_squared_error">mean-squared-error</a> for regression, and <a href="https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261">$L^1$ and $L^2$ norm</a> for regularization.
The gradient is typically generated as the derivatives of the loss function w.r.t. the parameters.
Following the <a href="https://en.wikipedia.org/wiki/Chain_rule">chain rule</a>, the gradient can be denoted as:</p>
<p>
\begin{equation}
\nabla_{\mathbb{\Theta}} l(\mathbf{y}, \mathbf{y}^*, \mathbb{\Theta}) = \frac{\partial l(\mathbf{y}, \mathbf{y}^*, \mathbb{\Theta})}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbb{\Theta}}. \label{eq:gradient}
\end{equation}
</p>
<p>During training, the parameters are updated as:</p>
<p>
\begin{equation}
\mathbb{\Theta} \leftarrow \mathbb{\Theta} - \alpha\nabla_{\mathbb{\Theta}} l(\mathbf{y}, \mathbf{y}^*, \mathbb{\Theta}), \label{eq:gd}
\end{equation}
</p>
<p>where $0<\alpha<1$ is the learning rate.</p>
<h2 id="limitations-of-default-automatic-differentiation">Limitations of default automatic differentiation</h2>
<p>The gradient in \eqref{eq:gradient} has two components: the derivative of the loss function w.r.t. the output $\mathbf{y}$, $\frac{\partial l(\mathbf{y}, \mathbf{y}^{*}, \mathbb{\Theta})}{\partial \mathbf{y}}$, and the derivative of the output $\mathbf{y}$ w.r.t. the parameters $\mathbb{\Theta}$, $\frac{\partial \mathbf{y}}{\partial \mathbb{\Theta}}$.
In most supervised and unsupervised learning, the loss function $l(\cdot)$ and the machine learning pipeline $f(\cdot;\mathbb{\Theta})$ are both differentiable, which allows the backpropagation of the gradient being carried out by the <a href="https://en.wikipedia.org/wiki/Automatic_differentiation">automatic differentiation</a> mechanism built in <em>Tensorflow</em> and <em>PyTorch</em>.</p>
<p>However, in reinforcement learning, especially in the development of new approaches, you may end up with a differentiable ML pipeline $f(\cdot;\mathbb{\Theta})$ and a non-differentiable loss/objective function, denoted as $l_{b}(\mathbf{y}, r)$, where $r$ is the observed feedback from the environment.
This is because in reinforcement learning, the objective is often to maximize or minimize certain performance metric that does not have an analytical expression but can only be observed from the interaction between the actions of the agent (prediction $\mathbf{y}$) and the environment.</p>
<p>If your ML problem requires a customized or non-differentiable loss/objective function, it is quite burdensome to go beyond the set of commonly used loss functions built in <em>Tensorflow</em> and <em>PyTorch</em>.
In <em>Tensorflow 2</em>, you need to learn the topic of <a href="https://www.tensorflow.org/guide/advanced_autodiff">advanced automatic differentiation</a> and work with the <code class="language-plaintext highlighter-rouge">tf.GradientTape</code> API and <code class="language-plaintext highlighter-rouge">apply_gradients</code> function.
In <em>PyTorch</em>, you need to <a href="https://pytorch.org/tutorials/beginner/examples_autograd/polynomial_custom_function.html">define new autograd functions</a>.
You need first convert your data to tensor and then perform operations on the tensors based on the built-in functions in <em>Tensorflow</em> or <em>PyTorch</em>.</p>
<h2 id="apply-custom-gradient-through-squared-loss">Apply custom gradient through squared Loss</h2>
<p>Let’s say you have worked out a formula to approximate (or guess) the gradient of a blackbox loss or objective function w.r.t. the prediction, $\nabla_{\mathbf{y}} l_{b}(\mathbf{y}, r)=\frac{\partial l_{b}(\mathbf{y}, r)}{\partial \mathbf{y}}$, which by the way is the major effort of <a href="https://spinningup.openai.com/en/latest/algorithms/vpg.html">policy gradient</a> reinforcement learning algorithm.
You may prefer to implement that formula with the numerical packages like <em>numpy</em> and <em>scipy</em> rather than the built-in functions of <em>Tensorflow</em> or <em>PyTorch</em>, since the former may have better performance and/or functionality than the latter, or the former makes debugging much easier.</p>
<p>In a reinforcement learning or customized learning setting, you first collect the experience tuples of state (input data), action (prediction), and reward, $<\mathbf{X}^{(t)}, \tilde{\mathbf{y}}^{(t)}, r^{(t)}>$ for $t=0,\dots,T$, and then compute (or guess) the derivative of your loss/objective function w.r.t. the action (prediction) as $\mathbf{\delta} = l_{b}(\tilde{\mathbf{y}}, r)$.
Note that with exploration, the actual prediction $\tilde{\mathbf{y}}^{(t)}$ does not necessarily equal to the output $\mathbf{y}^{(t)}=f(\mathbf{X}^{(t)};\mathbb{\Theta})$.
Instead of implementing your gradient estimation entirely in Tensorflow or PyTorch, you can first compute the gradient $\mathbb{\delta}^{(t)}$ with whatever packages you like, then plug it into a <code class="language-plaintext highlighter-rouge">placeholder</code> and apply it to the backpropagation through an off-the-shelf optimizer and the built-in <a href="https://www.tensorflow.org/api_docs/python/tf/keras/losses/MeanSquaredError">mean squared loss</a> or the following squared loss:</p>
<p>
\begin{equation}
l_{s}(\mathbf{y},\tilde{\mathbf{y}},\mathbf{\delta}) = \frac{1}{2} \lVert\mathbf{y} - (\tilde{\mathbf{y}} + \mathbf{\delta})\rVert_{2}^{2}
= \frac{1}{2} \sum_{k=1}^{n}\left[\mathbf{y}_{k} - (\tilde{\mathbf{y}}_{k} + \mathbf{\delta}_{k})\right]^2\;. \label{eq:loss}
\end{equation}
</p>
<p>This is because in the case of exploiation, where $\tilde{\mathbf{y}}^{(t)}=\mathbf{y}^{(t)}=f(\mathbf{X}^{(t)};\mathbb{\Theta})$, we have</p>
<p>
\begin{equation}
\frac{\partial l_{s}(\mathbf{y},\tilde{\mathbf{y}},\mathbf{\delta})}{\partial \mathbf{y}_k} = \mathbf{y}_{k} - (\tilde{\mathbf{y}}_{k} + \mathbf{\delta}_{k})=\mathbf{\delta}_{k}\;. \label{eq:proof}
\end{equation}
</p>
<p>The difference between squared loss and mean-squared-error loss is just a constant factor of $2/n$, which can be compensated by setting a larger or smaller learning rate $\alpha$.</p>
<h2 id="open-questions">Open questions</h2>
<ol>
<li>Would \eqref{eq:loss} work in the case of exploration, where $\tilde{\mathbf{y}}^{(t)}\neq\mathbf{y}^{(t)}=f(\mathbf{X}^{(t)};\mathbb{\Theta})$?</li>
</ol>
<p>Honestly, I don’t know.
Maybe just try \eqref{eq:loss} directly or scale $\mathbf{\delta}$ in \eqref{eq:loss} by a small constant $0<\varepsilon<1$.
We could also first run the forward pass to compute $\mathbf{y}^{(t)}=f(\mathbf{X}^{(t)};\mathbb{\Theta})$, then replace $\tilde{\mathbf{y}}$ in \eqref{eq:loss} to $\mathbf{y}^{(t)}$. In this case, we apply the gradient of another point $\tilde{\mathbf{y}}^{(t)}$ to the current point $\mathbf{y}^{(t)}$.
In stochastic gradient descent, the estimated gradient is quite noisy anyway.</p>Zhongyuan Zhaozhongyuan.zhao@rice.eduhttp://zhongyuanzhao.comBy Zhongyuan ZhaoMilitary and Amateur HF Radios – the Basics2022-03-07T00:00:00-06:002022-03-07T00:00:00-06:00https://zhongyuanzhao.com/posts/2022/03/hf-radios<p>Since the escalation of the <a href="https://en.wikipedia.org/wiki/2022_Russian_invasion_of_Ukraine">Russo-Ukrainian War</a> on Feb. 24, 2022, the high frequency (HF) radio has made <a href="https://www.nytimes.com/2022/03/03/business/media/bbc-shortwave-radio-ukraine.html">headlines</a> and sparked some <a href="https://www.rtl-sdr.com/radio-related-news-occurring-in-the-russia-ukraine-conflict/">discussions</a> in the amateur radio community. With low bandwidth (2.5kHz-10kHz per HF channel), HF radio can only be used for digital or analog voice and E-mail/chat/text messaging. However, if the Internet, satellites, and telecom networks were all blocked in a war or natural disaster, HF radio is the only way of long distance communications without relying on any infrastructure.</p>
<p>What makes HF (3-30MHz) radio extremely interesting is the <a href="https://www.electronics-notes.com/articles/antennas-propagation/ionospheric/hf-propagation-basics.php">Ionospheric Radio Propagation</a>, a.k.a. <a href="https://en.wikipedia.org/wiki/Skywave">skywave</a>.</p>
<blockquote>
<p>“In radio communication, skywave or skip refers to the propagation of radio waves reflected or refracted back toward Earth from the ionosphere, an electrically charged layer of the upper atmosphere. Since it is not limited by the curvature of the Earth, skywave propagation can be used to communicate beyond the horizon, at intercontinental distances. It is mostly used in the shortwave frequency bands.”</p>
<p>Source: <a href="https://en.wikipedia.org/wiki/Skywave">Wikipedia</a></p>
</blockquote>
<p>Specifically, there are two modes of skywave propagation, as illustrated in the following figures. Skywave with lower incident angle can reach longer distances, leaving a skip zone around the transmitter. The skip zone can be covered by near vertical incidence skywave (NVIS). In theory, a 20 Watt handheld SSB HF radio can talk to receivers over thousands of kilometers through oblique incidence propagation (lower incident angle), or receivers within hundreds of kilometers via NVIS.</p>
<p><img src="https://i.imgur.com/UKseQ9x.jpg" alt="Skip" />
<img src="https://i.imgur.com/LW34csx.jpg" alt="NVIS" /></p>
<p>The ionosphere is greatly affected by the time of the day and the space weather, which subsequently influence the workable frequency bands of skywave. Therefore, users of AM broadcast services and Single-sideband (SSB) communications in HF bands need to schedule their radio frequencies accordingly. Typically, lower frequencies at night and higher frequencies during the day. This is an unique challenge of HF radio, which can be addressed by <a href="https://en.wikipedia.org/wiki/Automatic_link_establishment">automatic link establishment</a> (ALE).</p>
<p><img src="https://i.imgur.com/VI99HqC.jpg" alt="Ionosphere layers" /></p>
<h2 id="further-reading">Further reading</h2>
<p>The following educational materials can help you better understand HF radio:</p>
<h3 id="1-military-hf-radios-video-series-by-matthew-sherburne">1. Military HF Radios (Video series) by Matthew Sherburne</h3>
<ul>
<li><a href="https://www.youtube.com/watch?v=dZSLM7iFVMg">Intro</a></li>
<li><a href="https://www.youtube.com/watch?v=lzjYSoYuoXI">Episode 1 - RF Theory</a></li>
<li><a href="https://www.youtube.com/watch?v=AoI1RHQuZWQ">Episode 2 - Military HF History</a></li>
<li><a href="https://www.youtube.com/watch?v=PBQ0c1_3Ugw">Episode 3 - HF NVIS</a></li>
<li><a href="https://www.youtube.com/watch?v=QEBho6Xvzdo">Episode 4 - VOACAP Analysis</a></li>
<li><a href="https://www.youtube.com/watch?v=wdrIOKXF7jE">Episode 5 - HF Antennas</a></li>
<li><a href="https://www.youtube.com/watch?v=3viGM7AHvPM">Episode 6 - 2G and 3G ALE</a></li>
<li>Episode 7 - Digital Communications. (<a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES 256</a> )</li>
<li>Episode 8 - US Army MARS. (<a href="https://en.wikipedia.org/wiki/Military_Auxiliary_Radio_System">wikipedia</a>, <a href="https://www.usarmymars.org/">official site</a>)</li>
<li>Episode 9 - Lessons from the Field</li>
</ul>
<p><em>Note that episodes 7-9 in this series are missing for obvious reasons. I added some links which I think are useful. This is the best introductory lecture for HF radios I have seen so far.</em></p>
<p>A shorter (45 min) introduction of this topic is
“<a href="https://www.youtube.com/watch?v=9QIeG4LiFMg">The HF Renaissance in the US Army</a>” presented by Prof. Col. Stephen Hamilton from West Point. This presentation covers some of the missing episodes of Matthew Sherburne’s lecture including some cool field tests.</p>
<h3 id="2-rohde--schwarz-high-frequency-hf-learning-center">2. Rohde & Schwarz High Frequency (HF) Learning Center</h3>
<p><a href="https://www.rohde-schwarz.com/us/campaigns/rsa/adt/hf-learning-center_253628.html">https://www.rohde-schwarz.com/us/campaigns/rsa/adt/hf-learning-center_253628.html</a></p>
<h3 id="3-websdr">3. WebSDR</h3>
<p>Beginners and hobbyists can access to web-based software-defined radio (SDR) (<a href="http://websdr.org">http://websdr.org</a>) to get some first-hand experience of MF and HF radios worldwide with your browser for free. Here is the screenshot of one of the earliest WebSDRs (<a href="http://websdr.ewi.utwente.nl:8901/">http://websdr.ewi.utwente.nl:8901/</a>) provided by the amateur radio club at the University of Twente in Netherlands. Each bright line in the waterfall plot of the spectrum represents an active channel, and we can see that the HF band is very busy in Netherlands.</p>
<p><img src="https://i.imgur.com/2lAKPc9.png" alt="Screenshot of WebRF at University of Twente" /></p>
<h3 id="4-rtl-sdr--gnu-radio-for-diy">4. RTL-SDR & GNU Radio for DIY</h3>
<ul>
<li>RTL-SDR <a href="https://www.rtl-sdr.com/">https://www.rtl-sdr.com/</a></li>
<li>GNU Radio <a href="https://www.gnuradio.org/">https://www.gnuradio.org/</a></li>
<li>The National Association for Amateur Radio <a href="https://www.arrl.org/">https://www.arrl.org/</a></li>
<li>HAM Radio School <a href="https://www.hamradioschool.com/">https://www.hamradioschool.com/</a></li>
</ul>
<h3 id="5-if-you-just-wanna-buy-an-off-the-shelf-radio-receiver">5. If you just wanna buy an off-the-shelf radio receiver…</h3>
<p>Your would probably see something like this, <strong>“AM/FM/LW/VHF/Shortwave SSB Radio”</strong>, on a commercial radio receiver. To choose a proper one, you need to know these jargons:</p>
<ul>
<li>AM: Amplitude modulation, <a href="https://en.wikipedia.org/wiki/AM_broadcasting">AM broadcasting</a>, mostly in 525 kHz - 1705 kHz (<a href="https://en.wikipedia.org/wiki/Medium_wave">medium-wave</a> or medium frequency (MF), 300KHz-3MHz), with 9kHz or 10kHz spacing. Range: 400km during daytime and 2000km at night.</li>
<li>FM: Frequency modulation, usually refer to <a href="https://en.wikipedia.org/wiki/FM_broadcasting">FM broacasting</a> in 88.0-108.0 MHz. Range: mostly local stations.</li>
<li>SSB: <a href="https://en.wikipedia.org/wiki/Single-sideband_modulation">Single-sideband modulation</a></li>
<li>LW: Longwave, or low frequency (LF), (148.5 kHz – 283.5 kHz), generally 9kHz spacing, has limited number of services.</li>
<li>SW: Shortwave, or high frequency (HF), 3-30MHz, mostly AM or SSB models. Range: intercontinental.</li>
<li>VHF: Very high frequency, 30-300MHz, VHF band 1: 54-88MHz (TV), VHF band 2: 87.5-108MHz (FM), VHF band 3: 174-216 (TV).</li>
<li>Airband/Aircraft band: 108-136 MHz, <a href="https://en.wikipedia.org/wiki/Airband">https://en.wikipedia.org/wiki/Airband</a></li>
<li>UHF: <a href="https://en.wikipedia.org/wiki/Ultra_high_frequency">Ultra high frequency</a>, 300-3000 MHz.</li>
</ul>
<p>see <a href="https://en.wikipedia.org/wiki/Broadcast_band">Broadcast band</a></p>
<h3 id="6-books--research-papers">6. Books & Research Papers</h3>
<ul>
<li>Eric E. Johnson, Erik Koski, and William N. Furman. “Third-generation and wideband HF radio communications.” Artech House, 2013. <a href="https://books.google.com/books?id=luqEtNNsciMC&lpg=PR9&ots=6KLEMv2Syt&dq=Third-Generation%20and%20Wideband%20HF%20Radio%20Communications&lr&pg=PR9#v=onepage&q&f=false">Google Books</a></li>
<li>Jinlong Wang, Guoru Ding, and Haichao Wang, “HF communications: Past, present, and future,” in China Communications, vol. 15, no. 9, pp. 1-9, Sept. 2018, doi: <a href="https://ieeexplore.ieee.org/document/8456447">10.1109/CC.2018.8456447</a>.</li>
<li>Hervás M, Bergadà P, Alsina-Pagès RM. “Ionospheric Narrowband and Wideband HF Soundings for Communications Purposes: A Review.” Sensors (Basel). 2020;20(9):2486. Published 2020 Apr 28. doi:<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7273218/">10.3390/s20092486</a></li>
<li>Witvliet, B.A., Alsina-Pagès, R.M. “Radio communication via Near Vertical Incidence Skywave propagation: an overview.” Telecommun Syst 66, 295–309 (2017). <a href="https://doi.org/10.1007/s11235-017-0287-2">https://doi.org/10.1007/s11235-017-0287-2</a></li>
<li>Jian Wang, Yafei Shi, Cheng Yang, Feng Feng, “A review and prospects of operational frequency selecting techniques for HF radio communication,” Advances in Space Research, 2022, ISSN 0273-1177, <a href="https://doi.org/10.1016/j.asr.2022.01.026">https://doi.org/10.1016/j.asr.2022.01.026</a>.</li>
<li>Xianglong Yu, An-An Lu, Xiqi Gao, Geoffrey Ye Li, Guoru Ding, and Cheng-Xiang Wang, “HF Skywave Massive MIMO Communication,” in IEEE Transactions on Wireless Communications, doi: <a href="https://ieeexplore.ieee.org/document/9559764">10.1109/TWC.2021.3115820</a>.</li>
<li>Toros Arikan and Andrew C. Singer, “Receiver Designs for Low-Latency HF Communications,” in IEEE Transactions on Wireless Communications, vol. 20, no. 5, pp. 3005-3015, May 2021, doi: <a href="https://ieeexplore.ieee.org/document/9311865">10.1109/TWC.2020.3046475</a>.</li>
<li>Zhiqiang Qin, Jinlong Wang, Jin Chen, Guoru Ding, Yu-Dong Yao, Xinsheng Ji, and Xiangming Chen, “Link Quality Analysis Based Channel Selection in High-Frequency Asynchronous Automatic Link Establishment: A Matrix Completion Approach,” in IEEE Systems Journal, vol. 12, no. 2, pp. 1957-1968, June 2018, doi: <a href="https://ieeexplore.ieee.org/document/7962279">10.1109/JSYST.2017.2717702</a>.</li>
<li>Jian Wang, Cheng Yang and Wenxing An, “Regional Refined Long-term Predictions Method of Usable Frequency for HF Communication Based on Machine Learning over Asia,” in IEEE Transactions on Antennas and Propagation, doi: <a href="https://ieeexplore.ieee.org/document/9540339">10.1109/TAP.2021.3111634</a>.</li>
</ul>Zhongyuan Zhaozhongyuan.zhao@rice.eduhttp://zhongyuanzhao.comSince the escalation of the Russo-Ukrainian War on Feb. 24, 2022, the high frequency (HF) radio has made headlines and sparked some discussions in the amateur radio community. With low bandwidth (2.5kHz-10kHz per HF channel), HF radio can only be used for digital or analog voice and E-mail/chat/text messaging. However, if the Internet, satellites, and telecom networks were all blocked in a war or natural disaster, HF radio is the only way of long distance communications without relying on any infrastructure.I recommend this lecture from Stanford SNAP: “Machine Learning with Graphs”2021-02-07T00:00:00-06:002021-02-07T00:00:00-06:00https://zhongyuanzhao.com/posts/2021/02/stanford-machine-learning-with-graphs<p>For anyone interested in graphs, I highly recommend this lecture by Jure Leskovec and Michele Catasta from <a href="http://snap.stanford.edu/">Stanford Network Analysis Project</a> (SNAP). Checkout the <a href="https://web.stanford.edu/class/cs224w/">course home page</a>.</p>
<p>Here is a <a href="https://www.youtube.com/watch?v=uEPPnR22fxg&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=1">playlist</a> of lectures from the archive of <a href="http://snap.stanford.edu/class/cs224w-2019/">CS224W fall 2019</a>, where slides are available.</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=uEPPnR22fxg&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=1">Lecture 1 Introduction; Structure of Graphs</a></li>
<li><a href="https://www.youtube.com/watch?v=erMiEFGRsIk&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=2">Lecture 2 Properties of Networks And Random Graph Models</a></li>
<li><a href="https://www.youtube.com/watch?v=sdpqpj8g6YY&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=3">Lecture 3 Motifs and Structural Roles in Networks</a></li>
<li><a href="https://www.youtube.com/watch?v=Q7CHFo8UdPU&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=4">Lecture 4 Community Structure in Networks</a></li>
<li><a href="https://www.youtube.com/watch?v=VIu-ORmRspA&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=5">Lecture 5 Spectral Clustering</a></li>
<li><a href="https://www.youtube.com/watch?v=hTV44YH8Hd0&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=6">Lecture 6 Message Passing and Node Classification</a></li>
<li><a href="https://www.youtube.com/watch?v=4PTOhI8IWTo&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=7">Lecture 7 Graph Representation Learning</a></li>
<li><a href="https://www.youtube.com/watch?v=LdK9HzBAR8c&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=8">Lecture 8 Graph Neural Networks</a></li>
<li><a href="https://www.youtube.com/watch?v=X_fmiIy_YyI&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=9">Lecture 9 Graph Neural Networks Implementation with Pytorch Geometric</a></li>
<li><a href="https://www.youtube.com/watch?v=enyym0s94iY&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=10">Lecture 10 Deep Generative Models for Graphs</a></li>
<li><a href="https://www.youtube.com/watch?v=QD_NN6WUh9s&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=11">Lecture 11 Link Analysis - PageRank</a></li>
<li><a href="https://www.youtube.com/watch?v=50D4kA0gOPw&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=12">Lecture 12 Network Effects and Cascading Behavior</a></li>
<li><a href="https://www.youtube.com/watch?v=0VWQdbyFmtU&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=13">Lecture 13 Probabilistic Contagion and Models of Influence</a></li>
<li><a href="https://www.youtube.com/watch?v=hstYPmdW8PU&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=14">Lecture 14 Influence Maximization in Networks</a></li>
<li><a href="https://www.youtube.com/watch?v=fYOq5IX18JY&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=15">Lecture 15 Outbreak Detection in Networks</a></li>
<li><a href="https://www.youtube.com/watch?v=3pramEtovus&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=16">Lecture 16 Network Evolution</a></li>
<li><a href="https://www.youtube.com/watch?v=izK_u0appck&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=17">Lecture 17 Reasoning over Knowledge Graphs</a></li>
<li><a href="https://www.youtube.com/watch?v=BqZWbRivm8g&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=18">Lecture 18 Limitations of Graph Neural Networks</a></li>
<li><a href="https://www.youtube.com/watch?v=p2aqXKfRXEA&list=PL-Y8zK4dwCrQyASidb2mjj_itW2-YYx6-&index=19">Lecture 19 Applications of Graph Neural Networks</a></li>
</ul>
<p>Recommended reading from the <a href="https://web.stanford.edu/class/cs224w/">course page</a></p>
<ul>
<li><a href="https://www.cs.mcgill.ca/~wlh/grl_book/">Graph Representation Learning</a> by William L. Hamilton</li>
<li><a href="http://www.cs.cornell.edu/home/kleinber/networks-book/">Networks, Crowds, and Markets: Reasoning About a Highly Connected World</a> by David Easley and Jon Kleinberg</li>
<li><a href="http://networksciencebook.com/">Network Science</a> by Albert-László Barabási</li>
</ul>Zhongyuan Zhaozhongyuan.zhao@rice.eduhttp://zhongyuanzhao.comFor anyone interested in graphs, I highly recommend this lecture by Jure Leskovec and Michele Catasta from Stanford Network Analysis Project (SNAP). Checkout the course home page.