<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Konstantin Kirchheim on kkirchheim.de</title><link>https://www.kkirchheim.de/</link><description>Recent content in Konstantin Kirchheim on kkirchheim.de</description><language>en-us</language><atom:link href="https://www.kkirchheim.de/feed.xml" rel="self" type="application/rss+xml"/><item><title>Improving Out-of-Distribution Detection with Markov Logic Networks</title><link>https://www.kkirchheim.de/papers/mln-ood/</link><pubDate>Fri, 06 Jun 2025 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/mln-ood/</guid><description>&lt;p&gt;Our paper &lt;strong&gt;Improving Out-of-Distribution Detection with Markov Logic Network&lt;/strong&gt; has been accepted at the ICML.
In it, we propose a probabilistic extension of &lt;a href="https://www.kkirchheim.de/papers/logic-ood/"&gt;Out-of-Distribution Detection with Logical Reasoning&lt;/a&gt;, as well as a simple algorithm to mine logical constraints for OOD detection from a dataset.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Out-of-distribution (OOD) detection is essential for ensuring the reliability of deep learning models operating in open-world scenarios. Current OOD detectors mainly rely on statistical models to identify unusual patterns in the latent representations of a deep neural network. This work proposes to augment existing OOD detectors with probabilistic reasoning, utilizing Markov logic networks (MLNs). MLNs connect first-order logic with probabilistic reasoning to assign probabilities to inputs based on weighted logical constraints defined over human-understandable concepts, which offers improved explainability. Through extensive experiments on multiple datasets, we demonstrate that MLNs can significantly enhance the performance of a wide range of existing OOD detectors while maintaining computational efficiency. Furthermore, we introduce a simple algorithm for learning logical constraints for OOD detection from a dataset and showcase its effectiveness.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper <strong>Improving Out-of-Distribution Detection with Markov Logic Network</strong> has been accepted at the ICML.
In it, we propose a probabilistic extension of <a href="/papers/logic-ood/">Out-of-Distribution Detection with Logical Reasoning</a>, as well as a simple algorithm to mine logical constraints for OOD detection from a dataset.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Out-of-distribution (OOD) detection is essential for ensuring the reliability of deep learning models operating in open-world scenarios. Current OOD detectors mainly rely on statistical models to identify unusual patterns in the latent representations of a deep neural network. This work proposes to augment existing OOD detectors with probabilistic reasoning, utilizing Markov logic networks (MLNs). MLNs connect first-order logic with probabilistic reasoning to assign probabilities to inputs based on weighted logical constraints defined over human-understandable concepts, which offers improved explainability. Through extensive experiments on multiple datasets, we demonstrate that MLNs can significantly enhance the performance of a wide range of existing OOD detectors while maintaining computational efficiency. Furthermore, we introduce a simple algorithm for learning logical constraints for OOD detection from a dataset and showcase its effectiveness.</p>
<figure><a href="/pdf/mln-ood-poster.pdf"><img src="/img/thumbs/mln-ood-poster.webp"
    alt="Poster (PDF)" width="250px"></a><figcaption>
      <p>Poster (PDF)</p>
    </figcaption>
</figure>

]]></content:encoded></item><item><title>Out-of-Distribution Detection with Adversarial Outlier Exposure</title><link>https://www.kkirchheim.de/papers/aoe/</link><pubDate>Fri, 06 Jun 2025 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/aoe/</guid><description>&lt;p&gt;Our paper &lt;strong&gt;Out-of-Distribution Detection with Adversarial Outlier Exposure&lt;/strong&gt; has been accepted at the CVPR workshop for Safe Artificial Intelligence for All Domains (SAIAD).&lt;/p&gt;
&lt;p&gt;The experiments in the paper were mostly conducted by Thomas Botschen, who is currently a masters student at our lab.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Machine learning models typically perform reliably only on inputs drawn from the distribution they were trained on, making Out-of-Distribution (OOD) detection essential for safety-critical applications. While exposing models to example outliers during training is one of the most effective ways to enhance OOD detection, recent studies suggest that synthetically generated outliers can also act as regularizers for deep neural networks. In this paper, we propose an augmentation scheme for synthetic outliers that regularizes a classifier’s energy function by adversarially lowering the outliers’ energy during training. We demonstrate that our method improves OOD detection performance and adversarial robustness on OOD data on several image classification benchmarks. Additionally, we show that our approach preserves in-distribution generalization.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper <strong>Out-of-Distribution Detection with Adversarial Outlier Exposure</strong> has been accepted at the CVPR workshop for Safe Artificial Intelligence for All Domains (SAIAD).</p>
<p>The experiments in the paper were mostly conducted by Thomas Botschen, who is currently a masters student at our lab.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Machine learning models typically perform reliably only on inputs drawn from the distribution they were trained on, making Out-of-Distribution (OOD) detection essential for safety-critical applications. While exposing models to example outliers during training is one of the most effective ways to enhance OOD detection, recent studies suggest that synthetically generated outliers can also act as regularizers for deep neural networks. In this paper, we propose an augmentation scheme for synthetic outliers that regularizes a classifier’s energy function by adversarially lowering the outliers’ energy during training. We demonstrate that our method improves OOD detection performance and adversarial robustness on OOD data on several image classification benchmarks. Additionally, we show that our approach preserves in-distribution generalization.</p>
<!-- <figure><img src="/img/aoe/aoe-concept.webp"
    alt="Concept of adversarial outlier exposure" width="50%"><figcaption>
      <p>Concept of adversarial outlier exposure</p>
    </figcaption>
</figure>
 -->
<figure><a href="/pdf/aoe-poster.pdf"><img src="/img/thumbs/aoe-poster.webp"
    alt="Poster for AOE (PDF)." width="250px"></a><figcaption>
      <p>Poster for AOE (PDF).</p>
    </figcaption>
</figure>

]]></content:encoded></item><item><title>Invited Talk: Knowledge Discovery in Large Datasets using LLMs</title><link>https://www.kkirchheim.de/blog/impact-and-social-work/</link><pubDate>Mon, 24 Mar 2025 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/impact-and-social-work/</guid><description>&lt;p&gt;I was invited to given a talk on the topic &lt;em&gt;&amp;ldquo;Knowledge Discovery in Large Datasets using LLMs&amp;rdquo;&lt;/em&gt; at the 2nd &lt;a href="https://www.fhnw.ch/plattformen/wirkungen/"&gt;Conference on Impact and Social Work&lt;/a&gt;.
In the presentation, I tried to make that case that current LLMs, like ChatGPT, allow people with limited background in programming to analyze datasets, thereby contributing to a &amp;ldquo;democratization&amp;rdquo; of data science.&lt;br&gt;
The talk was basically a walk-through of a small data analysis project where I did not write a single line of the code myself, instead relying solely on ChatGPT pro.&lt;/p&gt;</description><content:encoded><![CDATA[<p>I was invited to given a talk on the topic <em>&ldquo;Knowledge Discovery in Large Datasets using LLMs&rdquo;</em>  at the 2nd <a href="https://www.fhnw.ch/plattformen/wirkungen/">Conference on Impact and Social Work</a>.
In the presentation, I tried to make that case that current LLMs, like ChatGPT, allow people with limited background in programming to analyze datasets, thereby contributing to a &ldquo;democratization&rdquo; of data science.<br>
The talk was basically a walk-through of a small data analysis project where I did not write a single line of the code myself, instead relying solely on ChatGPT pro.</p>
<p>There are, of course, some caveats: at some level of complexity, the LLM make mistakes, and without sufficient knowledge of coding, it becomes basically impossible to detect these situations.</p>
]]></content:encoded></item><item><title>On the Implementation of AI Ethics</title><link>https://www.kkirchheim.de/blog/ai-ethics/</link><pubDate>Mon, 24 Feb 2025 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/ai-ethics/</guid><description>&lt;div class="note"&gt;
This is a (german) term paper that I wrote in 2019 (in a pre-LLM era) for a seminar on the philosophical aspects of AI. It discusses general strategies for implementing ethical behavior in AI systems at the example of autonomous vehicles. While somewhat outdated, it still constitutes a reasonable introduction to the topic.
&lt;/div&gt;
&lt;h2 id="einleitung"&gt;
Einleitung
&lt;a href="#einleitung"&gt;§&lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;span class="dropcap"&gt;L&lt;/span&gt;&lt;span class="dropcap-rest"&gt;ange Zeit&lt;/span&gt; galten menschliche Individuen und Gesellschaften als die einzigen intelligenten Entscheidungsträger. Durch die Fortschritte in der Informatik, insbesondere in den Bereichen Künstliche Intelligenz und Machine Learning, entstanden autonom agierende Maschinen, die dem Menschen diese Position zunehmend streitig machen. Während diese autonomen Systeme bisher nur in abgegrenzten Gebieten eingesetzt wurden, wird erwartet, dass sie in Zukunft sämtliche Bereiche des gesellschaftlichen Lebens durchdringen werden. Die Bandbreite reicht dabei von Pflege- bis hin zu militärischen Robotern. Die Front dieser Generation von neuen, intelligenten Systemen bilden derzeit führerlose Autos, die bereits in der Praxis getestet werden.&lt;/p&gt;</description><content:encoded><![CDATA[<div class="note">
    This is a (german) term paper that I wrote in 2019 (in a pre-LLM era) for a seminar on the philosophical aspects of AI. It discusses general strategies for implementing ethical behavior in AI systems at the example of autonomous vehicles. While somewhat outdated, it still constitutes a reasonable introduction to the topic.
</div>
<h2 id="einleitung">
  Einleitung
  <a href="#einleitung">§</a>
</h2>

<p><span class="dropcap">L</span><span class="dropcap-rest">ange Zeit</span> galten menschliche Individuen und Gesellschaften als die einzigen intelligenten Entscheidungsträger. Durch die Fortschritte in der Informatik, insbesondere in den Bereichen Künstliche Intelligenz und Machine Learning, entstanden autonom agierende Maschinen, die dem Menschen diese Position zunehmend streitig machen. Während diese autonomen Systeme bisher nur in abgegrenzten Gebieten eingesetzt wurden, wird erwartet, dass sie in Zukunft sämtliche Bereiche des gesellschaftlichen Lebens durchdringen werden. Die Bandbreite reicht dabei von Pflege- bis hin zu militärischen Robotern. Die Front dieser Generation von neuen, intelligenten Systemen bilden derzeit führerlose Autos, die bereits in der Praxis getestet werden.</p>
<p>Das Aufkommen dieser neuen Klasse von Entscheidungsträgern wirft ein anderes Licht auf einige ältere ethische Fragen, wie z.B. das <a href="https://en.wikipedia.org/wiki/Trolley_problem">Trolley-Problem</a>. Die Moralphilosophie behandelt solche und ähnliche Dilemmata bereits seit langer Zeit, jedoch vor allem unter der Annahme, dass es sich bei den moralischen Agenten um Menschen handelt. Es hat sich gezeigt, dass sich die populären Ansätze wie der Konsequentialismus und die Deontologische Ethik relativ einfach auf nicht-menschliche Agenten übertragen lassen. In der praktischen Anwendung stellt sich jedoch die Frage, wie Ethisches Verhalten in nicht-menschlichen Agenten implementiert werden kann.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p>
<p>Im Folgenden sollen bekannte Implementierungsstrategien für moralisches Verhalten am Beispiel selbstfahrender Autos untersucht werden. Dazu wird zunächst erläutert, warum ungeschriebene ethische Regeln für künstliche Systeme von Bedeutung sind und wie sie sich zu expliziten Regeln (z.B. der Straßenverkehrsordnung) verhalten. Anschließend werden moralphilosophische Theorien vorgestellt, welche die Grundlage der meisten Implementierungen darstellen. Danach folgen einige ausgewählte Implementierungsstrategien, die zuletzt diskutiert werden.</p>
<h3 id="abgrenzung">
  Abgrenzung
  <a href="#abgrenzung">§</a>
</h3>

<p>Um den Rahmen der Arbeit einzugrenzen, werden im Folgenden einige Aspekte der Maschinenethik aufgezeigt, die nicht weiter betrachtet werden. So soll nicht erörtert werden, ob Maschinen aus philosophischer Perspektive die grundsätzliche Befähigung zu moralischem Verhalten besitzen. Aus Sicht der Kantschen Ethik ist ein Agent nur dann im Stande moralisch zu handeln, wenn er (1) alternative Entscheidungen treffen könnte und (2) “bewusst” die moralisch richtige Entscheidung trifft. Beide Kriterien werden durch Roboter nur bedingt erfüllt, da (insbesondere) bei Maschinen streitbar ist, ob und zu welchem Grad sie über freien Willen verfügen, der ihnen das tatsächliche Entscheiden ermöglichen würde. Andererseits ist ungeklärt, ob und wie Maschinen in der Lage sein können, sich einer solchen Entscheidung bewusst zu sein. Es soll also weder über freien Willen noch über Bewusstsein diskutiert werden.</p>
<h3 id="moralische-regeln">
  Moralische Regeln
  <a href="#moralische-regeln">§</a>
</h3>

<p>Es wird im Folgenden für die Notwendigkeit von ethisch handelnden Maschinen argumentiert. Eine solche Argumentation kann allerdings nicht auf der Basis einer moralischen Bewertung erfolgen, da dies nur unter der jeweils angewandten moralischen Perspektive gültig wäre.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> Um einen Zirkelschluss zu vermeiden, erfolgt eine Betrachtung aus marktwirtschaftlicher Sicht.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<p>Nach Maurer et al.<sup id="fnref1:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> werden autonome Fahrzeuge bzw. die sie steuernden Algorithmen nicht anhand von Statistiken oder durch Tests bewertet, sondern durch die ethischen und moralischen Standards der Gesellschaft. So wäre ein amokfahrendes Auto für die Hersteller ein PR-Desaster, wohingegen Autos mit sozial vorbildlichem Verhalten die soziale Akzeptanz (und somit ggf. auch die Absatzzahlen) erhöhen können. Wenn ein Auto also einen Unfall verursacht, ist es notwendig, dass es eine unter diesen Umständen von der Gesellschaft als moralisch vertretbar wahrgenommene Entscheidung getroffen hat. Dabei wird es nicht ausreichen, sich lediglich an die expliziten juristischen Gesetze zu halten und dabei die ungeschriebenen gesellschaftlichen Normen zu ignorieren. Dies würde zwar eine Verurteilung vor Gericht verhindern, nicht jedoch eine Verurteilung durch die Gesellschaft.</p>
<p>Unabhängig davon, ob Maschinen schuldfähig sein können, wird die Schuld für als unmoralisch wahrgenommene Entscheidung den Herstellern gegeben werden, denn ein Auto — das in absehbarer Zeit nicht als Entität mit freiem Willen gesehen wird — kann aus Sicht der Gesellschaft schwerlich selbst für etwaiges Fehlverhalten verantwortlich gemacht werden. Aus diesen Gründen ist es also aus marktwirtschaftlicher Sicht sinnvoll, Verhalten zu implementieren, das zumindest nach außen moralisch wirkt.</p>
<h3 id="verkehrsregeln">
  Verkehrsregeln
  <a href="#verkehrsregeln">§</a>
</h3>

<p>Wie bereits festgestellt, ist es für die Autohersteller von Vorteil, moralische Überlegungen in die Entscheidungslogik ihrer autonomen Fahrzeuge zu integrieren. Noch nicht geklärt ist, wie mit Straßenverkehrsregeln umgegangen werden soll. Maurer et al.<sup id="fnref2:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> behaupten, dass es sich bei Straßenverkehrsregeln in der Praxis nicht um tatsächliche Regeln handelt, die unbedingt eingehalten werden müssen, sondern eher um Richtlinien, die mit anderen Einflussfaktoren wie der Verkehrssicherheit, dem Verkehrsfluss oder allgemein der Zweckdienlichkeit abgewogen werden müssen. So hatte beispielsweise ein selbstfahrendes Auto von Google in Praxistests Schwierigkeiten an vierarmigen Kreuzungen, weil die anderen, menschlichen Autofahrer vor dem Weiterfahren nicht lange genug hielten, während das Google-Auto auf Rücksicht programmiert wurde und die anderen Fahrer passieren ließ, ohne selbst jemals loszufahren. Als Konsequenz wurde es den Autos erlaubt, die Verkehrsregeln zu brechen, wenn ihr Nutzer es ihnen befahl.</p>
<p>Allgemein lässt sich festhalten, dass es für die Autohersteller unvorteilhaft wäre, wenn die Autos durch das strikte Einhalten der Verkehrsregeln z.B. den Verkehrsfluss aufhielten, weil dies die gesellschaftliche Akzeptanz verringern könnte. Eine praxistaugliche Implementierung sollte daher die Interessen der unterschiedlichen Verkehrsteilnehmer gegeneinander abwiegen, anstatt Verkehrsregeln als in jedem Fall einzuhaltende Regeln anzusehen.</p>
<h2 id="moralphilosophische-theorien">
  Moralphilosophische Theorien
  <a href="#moralphilosophische-theorien">§</a>
</h2>

<p>Dieser Abschnitt stellt einige moralphilosophische Theorien vor, die sich eignen könnten, um eine berechenbare mathematische Beschreibung von Moral zu erzeugen. Die hier gewählte Gliederung orientiert sich an Allen et al.<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>, wobei anzumerken ist, dass sich eine strikte Einteilung aufgrund unscharfer Grenzen nur schwer erreichen lässt.</p>
<h3 id="konsequentialismus">
  Konsequentialismus
  <a href="#konsequentialismus">§</a>
</h3>

<p>Der Konsequentialismus beschreibt die Ansicht, dass die normative Einordnung einer Handlung nur von ihren Konsequenzen abhängt. Zu den konsequentialistischen Ethiken gehören beispielsweise der Utilitarismus und der ethische Egoismus.</p>
<h4 id="utilitarismus">
  Utilitarismus
  <a href="#utilitarismus">§</a>
</h4>

<p>Der Utilitarismus befindet diejenige Entscheidung für gut, deren Konsequenzen den größtmöglichen aggregierten Gesamtnutzen für alle Individuen bedeuten. Man kann den Utilitarismus dementsprechend als Optimierungsproblem auffassen.</p>
<h4 id="ethischer-egoismus">
  Ethischer Egoismus
  <a href="#ethischer-egoismus">§</a>
</h4>

<p>Die Theorie des ethischen Egoismus betrachtet diejenigen Handlungen als gut, die dem handelnden Individuum maximal nützen. Sie kann wie der Utilitarismus auch als Optimierungsproblem aufgefasst werden, berücksichtigt die Interessen anderer aber nur insoweit, wie dies den eigenen Interessen dient.</p>
<h3 id="deontologische-ethik">
  Deontologische Ethik
  <a href="#deontologische-ethik">§</a>
</h3>

<p>In der <a href="https://de.wikipedia.org/wiki/Deontologische_Ethik">deontologischen Ethik</a> (oder Pflichtethik) wird die normative Einordnung einer Handlung nicht (nur) aufgrund ihrer Konsequenzen vorgenommen. Zusätzlich können bestimmte Handlungen als intrinsisch gut oder schlecht gelten (z.B. lügen oder töten), wobei sich der Grad des Absolutheitsanspruchs zwischen verschiedenen Strömungen unterscheidet. Es kann zudem zu Konflikten zwischen verschiedenen Regeln kommen, wenn jemand in eine Situation gerät, in der er entweder töten oder aber lügen muss. Auch der Umgang mit derartigen Konflikten ist unterschiedlich.</p>
<h4 id="kategorischer-imperativ">
  Kategorischer Imperativ
  <a href="#kategorischer-imperativ">§</a>
</h4>

<p>Der <a href="https://de.wikipedia.org/wiki/Kategorischer_Imperativ">kategorische Imperativ</a> Kants lässt sich als eine Pflicht im Sinne der deontologischen Ethik verstehen. Er lautet:</p>
<blockquote>
<p>&ldquo;Handle nur nach derjenigen Maxime, durch die du zugleich wollen kannst, dass sie ein allgemeines Gesetz werde.&rdquo;</p>
</blockquote>
<p>Eine ausführlichere Beschreibung kann der umfangreichen Literatur zu diesem Thema entnommen werden.</p>
<h4 id="asimovsche-ethik">
  Asimovsche Ethik
  <a href="#asimovsche-ethik">§</a>
</h4>

<p>Im Gegensatz zu anderen Theorien, die zunächst für menschliche Agenten gedacht waren, wurde die Asimovsche Ethik explizit als Regelsystem für Maschinen formuliert und betont dementsprechend auch den Unterschied, wobei den Menschen Vorrang eingeräumt wird.<sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup> Die Regeln lauten:</p>
<ol>
<li>Ein Roboter darf kein menschliches Wesen (wissentlich) verletzen oder durch Untätigkeit (wissentlich) zulassen, dass einem menschlichen Wesen Schaden zugefügt wird.</li>
<li>Ein Roboter muss den ihm von einem Menschen gegebenen Befehlen gehorchen — es sei denn, ein solcher Befehl würde mit Regel eins kollidieren.</li>
<li>Ein Roboter muss seine Existenz beschützen, solange dieser Schutz nicht mit Regel eins oder zwei kollidiert.</li>
</ol>
<p>Asimov begegnet der Möglichkeit von Konflikten der Regeln durch eine strikte Priorisierung. Für den Straßenverkehr sind diese Gesetze vermutlich nicht ausreichend, dennoch haben sie einen großen Einfluss auf das Gebiet der Maschinenethik und es existieren mehrere Implementierungen.<sup id="fnref:6"><a href="#fn:6" class="footnote-ref" role="doc-noteref">6</a></sup><sup id="fnref:7"><a href="#fn:7" class="footnote-ref" role="doc-noteref">7</a></sup> Daher sollen die Asimovschen Robotergesetze als Prototyp einer strikt hierarchischen deontologischen Ethik erwähnt werden.</p>
<h3 id="tugendethik">
  Tugendethik
  <a href="#tugendethik">§</a>
</h3>

<p>Die Tugendethik bewertet diejenigen Handlungen als gut, welche tugendhaft sind oder zumindest aus gutem Willen erfolgen. Bei einer Literaturrecherche ließen sich hierzu keine konkreten Implementierungsversuche finden. Laut Allen et al.<sup id="fnref:8"><a href="#fn:8" class="footnote-ref" role="doc-noteref">8</a></sup> lassen sich Tugendethiken auf Deontologische Ethiken zurückführen, indem Tugenden als Pflichten formuliert werden. Abstrakte Tugenden lassen sich allerdings nur schwer mathematisch erfassen. Die Tugendethik wird daher im Folgenden nicht weiter betrachtet.</p>
<h2 id="implementierungsstrategien">
  Implementierungsstrategien
  <a href="#implementierungsstrategien">§</a>
</h2>

<p>Verschiedene Quellen schlagen unterschiedliche Gliederungen für Implementierungsansätze vor.<sup id="fnref:9"><a href="#fn:9" class="footnote-ref" role="doc-noteref">9</a></sup><sup id="fnref1:8"><a href="#fn:8" class="footnote-ref" role="doc-noteref">8</a></sup> In Anlehnung an Allen et al.<sup id="fnref1:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup> lassen sich einige Strategien wie folgt klassifizieren, wobei die Grenzen fließend sein können.</p>
<figure><img src="/img/ai-ethics/ethical-ai-survey.webp"
    alt="Grobe Einteilung der Strategien für die Implementierung moralischer AI (Blau). Bereits existierende Implementierungen sind Rot dargestellt. Zu Bottom-Up-Hybrid-Verfahren konnte keine Literatur gefunden werden. Es existieren keine bekannten Top-Down-Bottom-Up-Hybrid-Implementierungen."><figcaption>
      <p>Grobe Einteilung der Strategien für die Implementierung moralischer AI (Blau). Bereits existierende Implementierungen sind Rot dargestellt. Zu Bottom-Up-Hybrid-Verfahren konnte keine Literatur gefunden werden. Es existieren keine bekannten Top-Down-Bottom-Up-Hybrid-Implementierungen.</p>
    </figcaption>
</figure>

<h3 id="top-down">
  Top-Down
  <a href="#top-down">§</a>
</h3>

<p>Bei Top-Down-Ansätzen versucht man, ein System zu entwerfen, das eine vorgegebene Aufgabe löst. Übertragen auf moralische KI bedeutet dies, eine konkrete Moralphilosophie algorithmisch abzubilden. Während sich dieses Vorgehen für “allgemeine” künstliche Intelligenz als zu unflexibel erwiesen hat, kann es in bestimmten Spezialbereichen weiterhin nützlich sein.<sup id="fnref2:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup></p>
<h4 id="konsequentialismus-1">
  Konsequentialismus
  <a href="#konsequentialismus-1">§</a>
</h4>

<p>Ein Beispiel für eine konsequentialistische Implementierung liefert der “Ethical Layer.”<sup id="fnref1:7"><a href="#fn:7" class="footnote-ref" role="doc-noteref">7</a></sup> Dort führt ein moralischer Agent interne Simulationen durch, um die Konsequenzen seiner Handlungen abzuschätzen, die dann ethisch bewertet werden. Der Ansatz ist von der Theorie der simulierenden Kognition inspiriert.<sup id="fnref:10"><a href="#fn:10" class="footnote-ref" role="doc-noteref">10</a></sup> Experimente zeigen, dass ein so programmierter Agent die Asimovschen Gesetze befolgen kann, indem er in allen Szenarien Handlungen vermeidet, die gegen diese Gesetze verstoßen würden.</p>
<figure><img src="/img/ai-ethics/ethical-layer.webp"
    alt="Vereinfachte Darstellung der Ethical-Layer-Architektur. Die Steuerungseinheit des Roboters ermittelt aus den aktuellen Zielen mögliche Handlungen und schickt diese an den Ethical-Layer. In dessen Simulations-Modul werden die erwarteten Zustände der Welt für die jeweiligen Handlungen berechnet. Die Ergebnisse dieser Simulationen werden dann durch ein Evaluations-Modul nach Ethischen Maßstäben bewertet und abschließend an die Steuerungseinheit zurückgeschickt." width="450px"><figcaption>
      <p>Vereinfachte Darstellung der Ethical-Layer-Architektur. Die Steuerungseinheit des Roboters ermittelt aus den aktuellen Zielen mögliche Handlungen und schickt diese an den Ethical-Layer. In dessen Simulations-Modul werden die erwarteten Zustände der Welt für die jeweiligen Handlungen berechnet. Die Ergebnisse dieser Simulationen werden dann durch ein Evaluations-Modul nach Ethischen Maßstäben bewertet und abschließend an die Steuerungseinheit zurückgeschickt.</p>
    </figcaption>
</figure>

<h4 id="deontologische-ethik-1">
  Deontologische Ethik
  <a href="#deontologische-ethik-1">§</a>
</h4>

<p>Eine Implementierung deontologischer Ethik beschreibt Briggs und Scheutz.<sup id="fnref:11"><a href="#fn:11" class="footnote-ref" role="doc-noteref">11</a></sup> Ein Agent arbeitet hier auf einer widerspruchsfreien Menge von Regeln $\varphi$. Erhält er einen Auftrag $\alpha$, prüft er zunächst, ob er eine Pflicht $\mathrm{obl}$ zur Erfüllung hat und ob dies seinen aktuellen Zielen widerspricht. Falls nicht, fügt er sich das Erfüllen des Auftrags als neues Ziel hinzu. Formal kann dies wie folgt ausgedrückt werden:
$$
\mathrm{obl}(\alpha, \varphi) \wedge \neg \mathrm{per}(\alpha, \neg \varphi)\ \rightarrow  \mathrm{goal}(\alpha, \varphi)
$$</p>
<h4 id="hybrid-top-down">
  Hybrid (Top-Down)
  <a href="#hybrid-top-down">§</a>
</h4>

<p>Manche Autoren vertreten die Ansicht, dass eine einzelne Ethik nicht genügt, um Moral in autonomen Fahrzeugen vollständig zu implementieren.<sup id="fnref3:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup><sup id="fnref:12"><a href="#fn:12" class="footnote-ref" role="doc-noteref">12</a></sup> Daher existieren Modelle, die deontologische und konsequentialistische Elemente zusammenführen. Maurer et al.<sup id="fnref4:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> beschreiben etwa einen Ansatz mit einer Mischung aus Kostenfunktion (Utility) und unverletzbaren Regeln. Die Kostenfunktion berücksichtigt gewichtete Teilziele (z.B. Verkehrsregeln, Transportbedürfnisse des Nutzers). Für jede potenzielle Trajektorie des Fahrzeugs wird die Kostenfunktion berechnet. Anschließend wird versucht, die Kosten zu minimieren. Die strikten Regeln dagegen dürfen in keinem Fall verletzt werden. So soll etwa verhindert werden, dass das Fahrzeug einen Fußgänger überfährt, nur weil scharfes Bremsen für die Insassen unangenehm wäre. Dilemma-Situationen werden durch Prioritäten für die Regeln aufgelöst. Mathematisch interpretiert man dies als eine Optimierung der Kostenfunktion unter bestimmten Constraints.</p>
<h3 id="bottom-up">
  Bottom-Up
  <a href="#bottom-up">§</a>
</h3>

<p>Bottom-Up-Ansätze verzichten darauf, die Moralphilosophie vorzuschreiben. Stattdessen wird das System “von unten” her aufgebaut, indem es schrittweise lernt (oder im evolutiven Sinne “heranwächst”), wie es sich verhalten sollte, ohne dass explizit eine bestimmte Moraltheorie implementiert wird.</p>
<h4 id="assoziatives-lernen">
  Assoziatives Lernen
  <a href="#assoziatives-lernen">§</a>
</h4>

<p>Beim assoziativen Lernen erwirbt ein Agent das erwartete moralische Verhalten anhand von Feedback. Abel et al.<sup id="fnref1:9"><a href="#fn:9" class="footnote-ref" role="doc-noteref">9</a></sup> beschreiben ein Beispiel mit Reinforcement Learning in Form eines Markow-Entscheidungsproblems. Dieses wird definiert durch</p>
<ul>
<li>Zustände $s \in S$,</li>
<li>Aktionen $a \in A$,</li>
<li>eine Belohnungsfunktion $R(s,a)$ und</li>
<li>eine Wahrscheinlichkeitsverteilung $T(s,a,s&rsquo;)$ für Zustandsübergänge von $s$ nach $s&rsquo;$, gegeben Aktion $a$.</li>
</ul>
<p>Der Agent wählt in jedem Zustand $s$ eine Aktion $a$, erhält eine Belohnung und wechselt gemäß $T$ in einen Folgezustand $s&rsquo;$. Das Ziel ist es, über die Zeit möglichst viel Belohnung anzusammeln. Experimentell ließ sich zeigen, dass der Agent hierbei bestimmte Dilemmata wie das “Cake or Death” Problem<sup id="fnref:13"><a href="#fn:13" class="footnote-ref" role="doc-noteref">13</a></sup> auflösen kann.</p>
<h4 id="soziobiologisch">
  Soziobiologisch
  <a href="#soziobiologisch">§</a>
</h4>

<p>Soziobiologische Ansätze versuchen, durch die Simulation einer biologischen Evolution moralische Agenten hervorzubringen. Viele existierende Arbeiten konzentrieren sich dabei auf Szenarien, in denen Individuen in direkter Konkurrenz stehen.<sup id="fnref:14"><a href="#fn:14" class="footnote-ref" role="doc-noteref">14</a></sup> Hier kommt ihnen Kooperationsverhalten zugute, wodurch sich ein evolutiver Vorteil ergibt. Dieser “eigennützige Ursprung der Moral” kann mancher Auffassung nach auch beim Menschen beobachtet werden.<sup id="fnref3:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup> Da auf diesem Weg letztlich das eigene Überleben im Verbund gefördert wird, lässt sich dies als eine Form des ethischen Egoismus werten. Für den Einsatz in autonomen Fahrzeugen ist bislang keine derartige Lösung bekannt, was wohl an der hohen Komplexität dieses Anwendungsfalls liegt.</p>
<h4 id="hybrid-bottom-up">
  Hybrid (Bottom-Up)
  <a href="#hybrid-bottom-up">§</a>
</h4>

<p>Es ist auch denkbar, Bottom-Up- und Top-Down-Ansätze zu kombinieren.<sup id="fnref1:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup><sup id="fnref4:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup> Beispielsweise könnte man bestimmte Grundregeln fest einprogrammieren und unzugänglich machen, während das System weniger starre Normen oder Prioritäten durch Erfahrung lernt. Ein Vergleich zum Menschen wird oft angeführt: Genetische Anlagen und gesellschaftliche Prägung schaffen die Grundlage, auf der sich individuelle moralische Vorstellungen ausbilden. Eine andere Möglichkeit wäre, dass ein System Bottom-Up lernt, die Folgen seiner Handlungen einzuschätzen, bevor es anschließend Top-Down eine ethische Entscheidung fällt.</p>
<h3 id="hybrid">
  Hybrid
  <a href="#hybrid">§</a>
</h3>

<p>Sehr selten werden beide Prinzipien miteinander kombiniert, etwa indem man Top-Down-Module mit einem Bottom-Up-Mechanismus zur automatischen Bestimmung von Parametern (z.B. Gewichte in Kostenfunktionen) verbindet. In anderen Bereichen werden solche Methoden durchaus genutzt (etwa Evolutionäre Algorithmen zur Hyperparameter-Optimierung im Deep Learning).<sup id="fnref:15"><a href="#fn:15" class="footnote-ref" role="doc-noteref">15</a></sup> Für das autonome Fahren bleibt dies weitgehend unerforscht.</p>
<h2 id="diskussion">
  Diskussion
  <a href="#diskussion">§</a>
</h2>

<h3 id="top-down-1">
  Top-Down
  <a href="#top-down-1">§</a>
</h3>

<p>Konsequentialistische Ansätze stoßen in der praktischen Umsetzung an Grenzen, weil sie einerseits nie vollständige Informationen über die Welt haben können und andererseits in der Theorie unendlich weit in die Zukunft schauen müssten.<sup id="fnref2:8"><a href="#fn:8" class="footnote-ref" role="doc-noteref">8</a></sup> Dennoch wird dieses Problem in der Praxis meist dadurch entschärft, dass nur bis zu einem sinnvollen Zeithorizont oder nur über eine begrenzte Zahl an Handlungsalternativen simuliert wird. Der Ethical Layer<sup id="fnref2:7"><a href="#fn:7" class="footnote-ref" role="doc-noteref">7</a></sup> zeigt trotz dieser Grenzen keine offensichtlichen Nachteile, wenngleich es in Echtzeit-Szenarien (z.B. drohenden Unfallsituationen) komplex werden kann, schnell genug alle potenziellen Handlungen zu simulieren.</p>
<p>Deontologische Systeme sind in ihrer Logik oft eindeutig interpretierbar und erlauben Beweise über bestimmte Zusicherungen (z.B. dass nie eine Regel verletzt wird). Dies ist bei sicherheitskritischen Anwendungen ein wichtiger Vorteil. Allerdings erweisen sie sich als unflexibel, da sie eventuell nicht zulassen, im Bedarfsfall Verkehrsregeln zu verletzen, wenn dies ein höheres Gut schützen würde. Außerdem eignen sie sich schlecht für Dilemmata, in denen alle verfügbaren Handlungsoptionen “schlecht” sind. Manche Philosophen, wie Kant, versuchen durch eine einzige, sehr komplexe Regel (den Kategorischen Imperativ) verschiedene Fälle abzudecken, was wiederum praktisch schwer umsetzbar ist.<sup id="fnref3:8"><a href="#fn:8" class="footnote-ref" role="doc-noteref">8</a></sup></p>
<p>Hybride Top-Down-Systeme versuchen, deontologische und konsequentialistische Ideen zu verbinden. Sie können bestimmte unverletzliche Regeln vorschreiben und für andere Aspekte eine Kostenfunktion minimieren. Maurer et al.<sup id="fnref5:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> haben dies für autonome Fahrzeuge diskutiert. Das Problem verschiebt sich allerdings oft auf die Auswahl und Gewichtung der Regeln und Kostenfunktionen. Sofern diese nicht “richtig” gewählt sind, wird das Ergebnis in den Augen der Gesellschaft nicht als moralisch akzeptabel erscheinen.</p>
<h3 id="bottom-up-1">
  Bottom-Up
  <a href="#bottom-up-1">§</a>
</h3>

<p>Der Vorteil von Bottom-Up-Ansätzen liegt darin, dass sie keine bestimmte Moraltheorie “hart kodieren.” Sie können also selbstständig lernen, wie sie sich verhalten sollten, auch wenn die Gesellschaft widersprüchliche Vorstellungen hat. Gleichzeitig ist ein Nachteil, dass Entscheidungsfindungen solcher Systeme meist undurchsichtig sind (z.B. bei neuronalen Netzen). Für sicherheitskritische Anwendungen möchte man jedoch nachvollziehbare Systeme. Hier bleibt also abzuwarten, wie weit Forschung zu Erklärbarkeit und Transparenz kommt.<sup id="fnref:16"><a href="#fn:16" class="footnote-ref" role="doc-noteref">16</a></sup></p>
<h2 id="fazit">
  Fazit
  <a href="#fazit">§</a>
</h2>

<p>Diese Arbeit hat verschiedene Strategien zur Implementierung von Moral in autonomen Systemen vorgestellt, speziell am Beispiel selbstfahrender Autos. Es wurde erläutert, warum ungeschriebene moralische Regeln — neben den geschriebenen Verkehrsgesetzen — von Bedeutung sind. Ferner wurden verschiedene Moralphilosophien skizziert, die als Basis für Implementierungen dienen können, insbesondere der Konsequentialismus und die Deontologische Ethik. Die vorgestellten Strategien wurden in Top-Down-, Bottom-Up- und Hybrid-Ansätze unterteilt. Die Grenzen sind jedoch fließend. Nach einer Diskussion der Stärken und Schwächen erscheint vor allem der Ethical Layer aussichtsreich, da er eine generische, konsequentialistisch inspirierte Architektur zur moralischen Entscheidungsfindung liefert und zugleich an spezifische Anforderungen angepasst werden kann. Zudem lassen sich Hybrid-Varianten konstruieren, bei denen unterschiedliche moralische Theorien oder Lernverfahren je nach Bedarf kombiniert werden.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Vicky Charisi, Louise Dennis, Michael Fisher, Robert Lieck, Andreas Matthias, Marija Slavkovik, Janina Sombetzki, Alan F. T. Winfield, and Roman Yampolskiy. <em>Towards moral autonomous systems.</em> arXiv preprint arXiv:1703.04741, 2017.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>Markus Maurer, J. Christian Gerdes, Barbara Lenz, and Hermann Winner. <em>Autonomes Fahren: Technische, rechtliche und gesellschaftliche Aspekte.</em> Springer-Verlag, 2015.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref2:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref3:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref4:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref5:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>Ebd. (vgl. dortige Argumentation)&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>Colin Allen, Iva Smit, and Wendell Wallach. <em>Artificial morality: Top-down, bottom-up, and hybrid approaches.</em> Ethics and Information Technology 7(3):149–155, 2005.&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref2:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref3:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref4:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:5">
<p>Susan Leigh Anderson. <em>Asimov’s “Three Laws of Robotics” and machine metaethics.</em> AI &amp; Society, 22(4):477–493, 2008.&#160;<a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:6">
<p>Mateo Alvarez, Øyvind Berge, Audun Berget, Eirin Bjørknes, Dag V. K. Johnsen, Fredrik Madsen, and Marija Slavkovik. <em>Implementing Asimov’s First Law of Robotics.</em> 30th Norsk Informatikkonferanse, NIK, pages 27–29, 2017.&#160;<a href="#fnref:6" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:7">
<p>Dieter Vanderelst and Alan Winfield. <em>An architecture for ethical robots inspired by the simulation theory of cognition.</em> Cognitive Systems Research, 48:56–66, 2018.&#160;<a href="#fnref:7" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:7" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref2:7" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:8">
<p>Colin Allen, Gary Varner, and Jason Zinser. <em>Prolegomena to any future artificial moral agent.</em> Journal of Experimental &amp; Theoretical Artificial Intelligence, 12(3):251–261, 2000.&#160;<a href="#fnref:8" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:8" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref2:8" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref3:8" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:9">
<p>David Abel, James MacGlashan, and Michael L. Littman. <em>Reinforcement learning as a framework for ethical decision making.</em> In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, 2016.&#160;<a href="#fnref:9" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a>&#160;<a href="#fnref1:9" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:10">
<p>Simulation theory of cognition (allgemeines Konzept in der Kognitionswissenschaft).&#160;<a href="#fnref:10" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:11">
<p>Gordon Michael Briggs and Matthias Scheutz. <em>“Sorry, I can’t do that”: Developing mechanisms to appropriately reject directives in human-robot interactions.</em> In 2015 AAAI fall symposium series, 2015.&#160;<a href="#fnref:11" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:12">
<p>Noah J. Goodall. <em>Machine ethics and automated vehicles.</em> In Road Vehicle Automation, pages 93–102. Springer, 2014.&#160;<a href="#fnref:12" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:13">
<p>Stuart Armstrong. <em>Motivated value selection for artificial agents.</em> In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.&#160;<a href="#fnref:13" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:14">
<p>Robert Axelrod and William D. Hamilton. <em>The evolution of cooperation.</em> Science, 211(4489):1390–1396, 1981.&#160;<a href="#fnref:14" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:15">
<p>Steven R. Young, Derek C. Rose, Thomas P. Karnowski, Seung-Hwan Lim, and Robert M. Patton. <em>Optimizing deep learning hyper-parameters through an evolutionary algorithm.</em> In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, page 4. ACM, 2015.&#160;<a href="#fnref:15" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:16">
<p>Kevin Baum, Holger Hermanns, and Timo Speith. <em>From machine ethics to machine explainability and back.</em> In ISAIM, 2018.&#160;<a href="#fnref:16" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded></item><item><title>Home Server Setup 2025</title><link>https://www.kkirchheim.de/blog/homelab25/</link><pubDate>Sun, 26 Jan 2025 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/homelab25/</guid><description>&lt;p&gt;&lt;span class="dropcap"&gt;I&lt;/span&gt;&lt;span class="dropcap-rest"&gt;n this post,&lt;/span&gt; I want to present my current home server setup, including the hardware, the virtualized infrastructure (Networks, VMs), and the services (Containers) I am running.&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;
The goal is to give you some inspiration and also to have some more thorough documentation for myself.
While writing, I noticed some possible improvements, so there is value in the documentation process itself.&lt;/p&gt;
&lt;p&gt;This post will be quite long as the infrastructure evolved over a prolonged period.
To avoid convoluting it unnecessarily, I will only cover the most relevant subsystems and provide &lt;code&gt;docker-compose.yaml&lt;/code&gt; files for some services.&lt;/p&gt;</description><content:encoded><![CDATA[<p><span class="dropcap">I</span><span class="dropcap-rest">n this post,</span> I want to present my current home server setup, including the hardware, the virtualized infrastructure (Networks, VMs), and the services (Containers) I am running.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>
The goal is to give you some inspiration and also to have some more thorough documentation for myself.
While writing, I noticed some possible improvements, so there is value in the documentation process itself.</p>
<p>This post will be quite long as the infrastructure evolved over a prolonged period.
To avoid convoluting it unnecessarily, I will only cover the most relevant subsystems and provide <code>docker-compose.yaml</code> files for some services.</p>
<h2 id="physical-devices">
  Physical Devices
  <a href="#physical-devices">§</a>
</h2>

<p>While there are many physical computers in my home, like my personal computer, numerous <a href="https://en.wikipedia.org/wiki/ESP8266">ESP8266 microcontrollers</a>, and some Raspberry Pis that I use for playing music, as an alarm clock, or to control the 3D printer, the following are the most relevant to my infrastructure. Most of the devices are mounted in a 24-unit server rack. Those that can not be mounted just sit on rack shelves.</p>
<p>All of the devices combined draw $\approx$ 130W, depending on utilization.
At the current energy price, this amounts to 30€ per Month.</p>
<h3 id="hypervisor">
  Hypervisor
  <a href="#hypervisor">§</a>
</h3>

<p>Currently, I only use a single server (running <a href="https://en.wikipedia.org/wiki/Proxmox_Virtual_Environment">Proxmox</a>) as a virtualization platform.
This server is assembled from spare consumer components left over from old PCs. It has been upgraded several times with new disks and more RAM. I packaged it into a rack-mountable chassis four units in height.</p>
<ul>
<li>OS: Proxmox</li>
<li>RAM: 128GB DDR4</li>
<li>CPU: Ryzen 7 1700 8 cores @3.2Ghz</li>
</ul>
<p>I used to have a multi-node Proxmox cluster, but at some point, I could not rationalize the increased power consumption while having almost no benefits.</p>
<figure class="figure-right"><img src="/img/homelab24/homelab24.webp"
    alt="My rack" width="300px"><figcaption>
      <p>My rack</p>
    </figcaption>
</figure>

<h3 id="router">
  Router
  <a href="#router">§</a>
</h3>

<p>While it would be possible to virtualize the router, this is generally discouraged: you do not want a server reboot to also bring down your network. Also, sometimes, the hypervisor might crash, and while this is bad enough, it gets worse if the server takes down the network as well (which you might need to fix the server).</p>
<p>So, I decided to buy a dedicated physical computer for the router: an APU.2E4.</p>
<ul>
<li>OS: <a href="https://en.wikipedia.org/wiki/OPNsense">OPNsense</a></li>
<li>CPU: 4 cores (AMD) @1GHz</li>
<li>RAM: 4GB</li>
<li>Disk: 16GB SSD</li>
</ul>
<p>The power draw is about 6-12W.</p>
<p><strong>Public Key Infrastructure</strong> OPNsense can be used to manage a Public Key Infrastructure (PKI) with a graphical user interface, which is very convenient. The PKI can issue certificates for self-hosted web services or keys for authentication.</p>
<p><strong>VPN</strong> The router also runs OpenVPN. The authentication uses the PKI, so each client has its own private key, which it uses to access the VPN. Exposing only a small number of the services to the outside world helps to reduce the attack surface. All other services can still be accessed via VPN.</p>
<h3 id="switch">
  Switch
  <a href="#switch">§</a>
</h3>

<p>I use a Mikrotik Crs326-24G-2S, a managed switch with 24x RJ45 Ports running at 1GBit/s and two additional SPF+ ports.
I could not yet convince myself that it is necessary, but at some point, I want to use the SPF+ ports and upgrade my internal network to 10Gbit/s.
At the moment, this would not make much sense, as throughput is usually limited by the HDDs IO.</p>
<ul>
<li>OS: RouterOS</li>
</ul>
<p>This low-power and very quiet switch does not support Power over Ethernet (PoE). Therefore, I use another cheap switch with 4 RJ45 ports for PoE.</p>
<h3 id="wifi-access-point">
  WiFi Access Point
  <a href="#wifi-access-point">§</a>
</h3>

<p>As WiFi AP, I use a <em>Unify U6 Pro</em> that allows me to run three isolated WiFi networks:</p>
<ul>
<li><strong>Main WiFi</strong>: for devices I trust</li>
<li><strong>IoT WiFi</strong>: for all IoT devices (that I might not trust)</li>
<li><strong>Guest WiFi</strong>: for visitors</li>
</ul>
<p>This access point additionally requires the Unifi-Controller software to run somewhere.
Buying this particular AP might not have been the best choice from a free (as in freedom) software standpoint, but it has been working without any issues since I bought it, so there is that.</p>
<p>This AP needs to be powered over ethernet.</p>
<h3 id="dns">
  DNS
  <a href="#dns">§</a>
</h3>

<p>I have a Raspberry Pi 3 running <a href="https://pi-hole.net/">PiHole</a> as a DNS for my entire network.
This does not run inside a VM because I still want the DNS to work if the primary node goes offline.
The router is configured to use this PI as the DNS server for all requests that do not go to <code>*.local</code> addresses.</p>
<h2 id="vlans">
  VLANS
  <a href="#vlans">§</a>
</h2>

<p><a href="https://en.wikipedia.org/wiki/VLAN">Virtual LANs</a> allows you to create several virtual sub-network segments on top of a single physical LAN.
For a device in a VLAN, it looks as if it was in an isolated network.
The advantage of this is improved isolation: only machines in the same VLAN can communicate with each other.
Other traffic has to go over the router, where you have control over the firewall.</p>
<p>I set up a couple of VLANs in my home network, including:</p>
<ul>
<li><strong>External</strong>: Every machine exposed to the outside world is in this VLAN</li>
<li><strong>Internal</strong>: Every machine that runs services that do not have to be accessible from the outside is in this VLAN</li>
<li><strong>IoT</strong>: Everything related to home automation is in this VLAN</li>
<li><strong>Guests</strong>: Guests, who, for example, connect via the dedicated guest WiFi, will be assigned to this VLAN</li>
</ul>
<h2 id="storage">
  Storage
  <a href="#storage">§</a>
</h2>

<p>I use a 128GB SSD to store the OS and another 2TB NVME SSD to back the VM disks.</p>
<h3 id="zfs-raid">
  ZFS RAID
  <a href="#zfs-raid">§</a>
</h3>

<p>Apart from this, I only recently switched to RAID-backed HDD storage for my data. Currently, I am using 40TB <a href="https://en.wikipedia.org/wiki/ZFS">ZFS</a> RAID (from 3x20TB Disks), which are available to the network via <a href="https://en.wikipedia.org/wiki/Samba_(software)">Samba</a>. This RAID setup allows one of the three HDDs to fail without loss of data.</p>
<h3 id="configuring-vm-disks">
  Configuring VM Disks
  <a href="#configuring-vm-disks">§</a>
</h3>

<p>When configuring hard-disks for VMs in Proxmox, there are several configuration options that can improve performance.</p>
<ul>
<li>
<p>Set <strong>SCSI Controller</strong> to VirtIO SCSI Single, which means using a dedicated SCSI controller for a disk rather then multiple disks, which can improve IO performance.</p>
</li>
<li>
<p><strong>SSD emulation</strong>: Proxmox will tell the guest that the disk is a SSD, which can improve performance if the underlying disk is actually a SSD.</p>
</li>
<li>
<p><strong>Cache</strong>:  Write‐back caching accumulates writes in host RAM and flushes them later. This improves performance but <em>can cause problems on power loss</em>.</p>
</li>
<li>
<p><strong>IO thread</strong>:  Run IO in a separate thread to improve performance.</p>
</li>
<li>
<p><strong>Discard</strong>: Will automatically prune thin disks, so that, when you delete files/free blocks, the host will shrink the VM disk image.</p>
</li>
<li>
<p><strong>Async-IO</strong>: You can change this option for better performance, but this led to instability in my case.</p>
</li>
</ul>
<figure><img src="/img/homelab24/disk-config.webp"
    alt="Hard-Disk configuration in Proxmox"><figcaption>
      <p>Hard-Disk configuration in Proxmox</p>
    </figcaption>
</figure>

<p>I did not do any benchmarks, but it after switching to the above configuration, the VMs felt like bare metal.</p>
<h3 id="backups">
  Backups
  <a href="#backups">§</a>
</h3>

<p>The Pi that runs my DNS server also has a 4TB external HDD available via Samba and is set up as external storage for my Proxmox host.
In regular intervals, Proxmox saves backups of VM disks in this storage.
This makes it very convenient to restore a particular VM to its state of - say - yesterday in case something goes awry.</p>
<p><strong>Local Backups</strong> I additionally store daily backups on the ZFS RAID.</p>
<p><strong>Offsite Backups</strong> I have further backups in a remote location. These are updated in irregular intervals.</p>
<h2 id="virtual-machines">
  Virtual Machines
  <a href="#virtual-machines">§</a>
</h2>

<p>All of these machines run on the main hypervisor node. On the VMs, I run all of the services inside docker containers. From outside of a VM, some of the containers (those exposing web interfaces) are accessible via a Nginx reverse proxy.  This proxy takes HTTP(s) requests and forwards them to the corresponding container, based on the <code>Host</code>-field of the request.
This allows me to use different subdomains, e.g., <code>wiki.myhost.local</code> or <code>kiwix.myhost.local</code>, to access different services instead of binding them to different ports on the same machine. This way, I only have to remember the subdomain for each service and not some random port number.</p>
<h3 id="internal-services">
  Internal Services
  <a href="#internal-services">§</a>
</h3>

<p>I have a VM that runs most of my internal services and does not have to be accessible from the outside.
Hardware-wise, it uses the following:</p>
<ul>
<li>OS: Debian 12</li>
<li>CPU: 4 vCPU</li>
<li>RAM: 8GB</li>
<li>Storage: 32GB Thin Disk</li>
<li>NIC: single VirtIO (paravirtualized) connected to Internal VLAN</li>
</ul>
<h4 id="applications">
  Applications
  <a href="#applications">§</a>
</h4>

<ul>
<li>Nginx-Proxy-Manager: HTTPs for web-services</li>
<li>FreshRSS: RSS reader</li>
<li>Wallabag: allows you to store websites for later reading, comes with an excellent browser plugin</li>
<li>MediaWiki: basically Wikipedia, used for note-taking</li>
<li>Kiwix: offline browser</li>
<li>Uptime Kuma: monitors the online status of websites</li>
<li>Watchtower: automatically updates docker containers when new images are available</li>
<li>Telegraf: server monitoring solution</li>
<li>Stirling-PDF: PDF utilities</li>
<li>Unify-Controller: used to manage the WiFi AP</li>
<li>Portainer: GUI for managing docker containers</li>
<li>OpenWebUI: frontend for interacting with (local) LLMs</li>
<li>Firefly-III: personal finance management</li>
</ul>
<h4 id="nginx-proxy-manager">
  Nginx Proxy Manager
  <a href="#nginx-proxy-manager">§</a>
</h4>

<p>The proxy manager has a wildcard certificate (<code>*.myhost.local</code>) signed by my internal root certificate authority so that I can connect to all of the services via HTTPs.</p>
<p>The proxy and all containers accessible through it share a common docker network called <code>rproxy-network</code>.
This has to be created externally by running:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>docker network create rproxy-network
</span></span></code></pre></div><p>The <code>docker-compose.yaml</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">app</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;jc21/nginx-proxy-manager&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> proxy-manager
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">ports</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#a3be8c">&#39;81:81&#39;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#a3be8c">&#39;443:443&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./data:/data
</span></span><span style="display:flex;"><span>      - ./letsencrypt:/etc/letsencrypt
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">default</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">external</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">name</span><span style="color:#eceff4">:</span> rproxy-network
</span></span></code></pre></div><p>The forwarding has to be configured manually via Web UI that runs on port 81.</p>
<h4 id="freshrss">
  FreshRSS
  <a href="#freshrss">§</a>
</h4>

<p>FreshRSS is an RSS reader. The setup with the <a href="https://www.linuxserver.io/">linuxserver.io</a> image is straight-forward:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">freshrss</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> lscr.io/linuxserver/freshrss:latest
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> freshrss
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">hostname</span><span style="color:#eceff4">:</span> freshrss
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> unless-stopped
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - rproxy
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./data:/config
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">TZ</span><span style="color:#eceff4">:</span> Europe/Berlin
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">PUID</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">PGID</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1000</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">rproxy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">external</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">name</span><span style="color:#eceff4">:</span> rproxy-network
</span></span></code></pre></div><h4 id="wallabag">
  Wallabag
  <a href="#wallabag">§</a>
</h4>

<p>Wallabag allows you to save webpages for later offline reading. There is also a browser plugin for Firefox that lets you add a website swiftly.</p>
<p>The setup is a bit more complex.
Note that only the main container is connected to <code>rproxy-network</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">wallabag</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> wallabag/wallabag
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> wallabag
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>      - rproxy
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - MYSQL_ROOT_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_DRIVER=pdo_mysql
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_HOST=wallabag-db
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_PORT=3306
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_NAME=wallabag
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_USER=xxxx
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_CHARSET=utf8mb4
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DATABASE_TABLE_PREFIX=&#34;wallabag_&#34;
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__MAILER_DSN=smtp://127.0.0.1
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__FROM_EMAIL=wallabag@example.com
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__DOMAIN_NAME=https://wallabag.xxx
</span></span><span style="display:flex;"><span>      - SYMFONY__ENV__SERVER_NAME=&#34;xxx&#34;
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./images:/var/www/wallabag/web/assets/images
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">healthcheck</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">test</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;CMD&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;wget&#34;</span> <span style="color:#eceff4">,</span><span style="color:#a3be8c">&#34;--no-verbose&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;--tries=1&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;--spider&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;http://localhost&#34;</span><span style="color:#eceff4">]</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">interval</span><span style="color:#eceff4">:</span> 1m
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">timeout</span><span style="color:#eceff4">:</span> 3s
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">depends_on</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - db
</span></span><span style="display:flex;"><span>      - redis
</span></span><span style="display:flex;"><span>      
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">db</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> mariadb:10.8.2
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> wallabag-db
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - MYSQL_ROOT_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./data:/var/lib/mysql
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">healthcheck</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">test</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;CMD&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;mysqladmin&#34;</span> <span style="color:#eceff4">,</span><span style="color:#a3be8c">&#34;ping&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;-h&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;localhost&#34;</span><span style="color:#eceff4">]</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">interval</span><span style="color:#eceff4">:</span> 20s
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">timeout</span><span style="color:#eceff4">:</span> 3s
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">redis</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> redis:alpine
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> wallabag-redis
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">healthcheck</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">test</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;CMD&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;redis-cli&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;ping&#34;</span><span style="color:#eceff4">]</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">interval</span><span style="color:#eceff4">:</span> 20s
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">timeout</span><span style="color:#eceff4">:</span> 3s
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">rproxy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">external</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">name</span><span style="color:#eceff4">:</span> rproxy-network
</span></span></code></pre></div><h4 id="mediawiki">
  MediaWiki
  <a href="#mediawiki">§</a>
</h4>

<p>I use Mediawiki mainly for note-taking. While there might be better solutions, it works reasonably well for me.</p>
<p>Here is the <code>docker-compose.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">mediawiki</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> mediawiki:1.39.5
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> mediawiki
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">links</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - database
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#616e87;font-style:italic"># make sure this directory is writable</span>
</span></span><span style="display:flex;"><span>      - ./images/:/var/www/html/images
</span></span><span style="display:flex;"><span>      - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>      - rproxy
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">security_opt</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - seccomp:unconfined <span style="color:#616e87;font-style:italic"># wiki needs online services to render latex</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">database</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> mariadb:10.8.2 
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> mediawiki-db
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./db/:/var/lib/mysql
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">MYSQL_DATABASE</span><span style="color:#eceff4">:</span> my_wiki
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">MYSQL_USER</span><span style="color:#eceff4">:</span> wikiuser
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">MYSQL_PASSWORD</span><span style="color:#eceff4">:</span> xxxxx
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">MYSQL_RANDOM_ROOT_PASSWORD</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;yes&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">rproxy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">external</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">name</span><span style="color:#eceff4">:</span> rproxy-network
</span></span></code></pre></div><p><a href="https://en.wikipedia.org/wiki/Seccomp">Seccomp</a> is a security feature of the Linux kernel that disables certain system calls for containers.
Unfortunately, some containers seem to require some of these syscalls.
The option <code>seccomp:unconfined</code> disables seccomp entirely.</p>
<div class="warn">
    Disabling seccomp entirely is probably not optimal. You might want to consider if you should do this with regards to your threat model. Seccomp also provides more <a href="https://docs.docker.com/engine/security/seccomp/">granular syscall access management</a>.
</div>
<h4 id="kiwix">
  KiWix
  <a href="#kiwix">§</a>
</h4>

<p><a href="https://en.wikipedia.org/wiki/Kiwix">KiWix</a> is an offline browser that stores entire websites for offline use.
This way, you can use Wikipedia or Stack-Overflow even if you have no internet.</p>
<div class="note">
    This machine has my NAS mounted to <code>/mnt/</code>.
</div>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">kiwix</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> kiwix-serve
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> ghcr.io/kiwix/kiwix-serve
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - rproxy
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">command</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;*.zim&#39;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#a3be8c">&#34;/mnt/app-data/kiwix:/data&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">rproxy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">external</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">name</span><span style="color:#eceff4">:</span> rproxy-network
</span></span></code></pre></div><p>Offline websites are saved in so-called <code>zim</code> archive files.
Since these can be quite large (for me, the files amount to 246GB), I keep them on the network-attached storage. These are the <code>zim</code> files I have stored for offline use:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-fallback" data-lang="fallback"><span style="display:flex;"><span>user@machine:~/$ ls -lh /mnt/app-data/kiwix/
</span></span><span style="display:flex;"><span>total 246G
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user  29M Dec 16  2022 archlinux_en_all_maxi_2022-12.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user 115M May 16  2024 cooking.stackexchange.com_en_all_2024-05.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user 3.6G Oct  9 10:48 docs.python.org_en_2024-10.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user  70M Mar 20  2021 gentoo_en_all_maxi_2021-03.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user 2.7G Mar  8  2024 gutenberg_de_all_2023-08.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user 5.9M Aug 10 01:57 mspeekenbrink_en_all_2024-08.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user 112M Mar 10  2021 rationalwiki_en_all_maxi_2021-03.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user  75G Dec  1  2023 stackoverflow.com_en_all_2023-11.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user  74G Jul 18 10:09 ted_mul_all_2024-07.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user  28M May  7  2024 tor.stackexchange.com_en_all_2024-05.zim
</span></span><span style="display:flex;"><span>-rw-rw---- 1 user user 103G Jan 21  2024 wikipedia_en_all_maxi_2024-01.zim
</span></span></code></pre></div><p>I could imagine that, in the future, it might be possible to use such offline files as a basis for <a href="https://en.wikipedia.org/wiki/Retrieval-augmented_generation">RAG systems</a>.</p>
<h3 id="logging--monitoring">
  Logging &amp; Monitoring
  <a href="#logging--monitoring">§</a>
</h3>

<p>For historical reasons, I have a dedicated VM for logging and monitoring purposes. It lives on the main LAN, and access is managed via firewall.</p>
<ul>
<li>OS: Ubuntu 22.04</li>
<li>CPUs: 2 vCores</li>
<li>RAM: 2GB</li>
<li>Disk: 64GB Thin Disk</li>
<li>NIC: single VirtIO (paravirtualized) connected to the main LAN</li>
</ul>
<h4 id="applications-1">
  Applications
  <a href="#applications-1">§</a>
</h4>

<p>The VM runs the TIG stack:</p>
<ul>
<li><strong>T</strong>elegraf: monitoring, sends measurements to the database</li>
<li><strong>I</strong>nfluxDB: time series database that works well with Telegraf</li>
<li><strong>G</strong>rafana: visualization dashboard for InfluxDB data</li>
<li>Watchtower</li>
<li>Nginx-Proxy-Manager</li>
</ul>
<p>The Influx database stores everything on the VM disk. After close to 2 years of constant operation, it has written around 30GB of data.</p>
<figure><img src="/img/homelab24/grafana.webp"
    alt="Example of Grafana UI for monitoring my Router"><figcaption>
      <p>Example of Grafana UI for monitoring my Router</p>
    </figcaption>
</figure>

<h3 id="external-services">
  External Services
  <a href="#external-services">§</a>
</h3>

<p>On this VM, I run everything that is reachable from the outside world.</p>
<ul>
<li>OS: Debian 12</li>
<li>CPUs: 4 vCores</li>
<li>RAM: 16GB</li>
<li>NIC: single VirtIO (paravirtualized) connected to External VLAN</li>
</ul>
<h4 id="applications-2">
  Applications
  <a href="#applications-2">§</a>
</h4>

<ul>
<li>Portainer Agent</li>
<li>Nextcloud</li>
<li>Nginx (3x)</li>
<li>Nginx-Proxy-Manager</li>
<li>Gitlab CE</li>
<li>Docker Registry</li>
<li>Transmission: torrent client, in my case, mainly seeding Linux ISOs</li>
<li>Telegraf</li>
<li>Watchtower</li>
</ul>

<h4 id="nextcloud">
  Nextcloud
  <a href="#nextcloud">§</a>
</h4>

<p>Nextcloud is a self-hosted cloud service that can store files, contact information, as well as calendars.
Basically, it acts as a replacement for Google Drive, Dropbox, or similar services that you pay for one way or another.</p>
<p>The service requires a cronjob to run at regular intervals to do some housekeeping in the background. If you do not run these tasks, old access tokens, for example, will never be invalidated automatically. Also, I had some problems with files being locked. These problems could be resolved manually, but this is quite tedious. The solution I am using here - running an additional container with this cronjob - is taken from <a href="https://blog.networkprofile.org/vms-and-containers-i-am-running-2023/">here</a>.</p>
<div class="note">
    This VM has a subdirectory of the NAS mounted to <code>/home/user/docker/netxcloud/nextcloud-data/</code>. User id must be 33.
</div>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">nextcloud-db</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> mariadb:11.2.3
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> nextcloud-db
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#616e87;font-style:italic"># see https://github.com/nextcloud/server/issues/25436</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">command</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">|</span><span style="color:#616e87">
</span></span></span><span style="display:flex;"><span><span style="color:#616e87">      --transaction-isolation=READ-COMMITTED
</span></span></span><span style="display:flex;"><span><span style="color:#616e87">      --binlog-format=ROW
</span></span></span><span style="display:flex;"><span><span style="color:#616e87">      --skip-innodb-read-only-compressed</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./mysql/:/var/lib/mysql
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - MYSQL_ROOT_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>      - MYSQL_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>      - MYSQL_DATABASE=xxxx
</span></span><span style="display:flex;"><span>      - MYSQL_USER=xxxx
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">nextcloud</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> nextcloud:latest
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> nextcloud
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./nextcloud/:/var/www/html
</span></span><span style="display:flex;"><span>      - ./nextcloud-data:/var/www/html/data
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> always
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">depends_on</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - nextcloud-db
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - NEXTCLOUD_ADMIN_USER=xxxx
</span></span><span style="display:flex;"><span>      - NEXTCLOUD_ADMIN_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>      - MYSQL_DATABASE=nextcloud
</span></span><span style="display:flex;"><span>      - MYSQL_USER=xxxx
</span></span><span style="display:flex;"><span>      - MYSQL_PASSWORD=xxxx
</span></span><span style="display:flex;"><span>      - MYSQL_HOST=nextcloud-db
</span></span><span style="display:flex;"><span>      - VIRTUAL_HOST=xxxx
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - rproxy
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#616e87;font-style:italic"># taken from https://blog.networkprofile.org/vms-and-containers-i-am-running-2023/</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">cron</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> nextcloud:latest
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> nextcloud-cron
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> unless-stopped
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - ./nextcloud/:/var/www/html
</span></span><span style="display:flex;"><span>      - ./nextcloud-data/:/var/www/html/data
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">entrypoint</span><span style="color:#eceff4">:</span> /cron.sh
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">depends_on</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - nextcloud-db
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - default
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">networks</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">rproxy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">external</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">name</span><span style="color:#eceff4">:</span> rproxy-network
</span></span></code></pre></div><h3 id="gpu-server">
  GPU Server
  <a href="#gpu-server">§</a>
</h3>

<p>I have a VM with a GPU that is passed through from the physical host.
This VM runs all of the containers that can benefit from hardware acceleration.
Additionally, as the GPU has HDMI outputs, I can connect this VM to a TV.
This way, I do not have to power up my personal computer if I just need a desktop or want to watch a movie.</p>
<ul>
<li>OS: Debian 12</li>
<li>CPU: 6 vCores</li>
<li>RAM 16GB</li>
<li>Storage: 32GB Thin Disk</li>
<li>GPU: Nvidia 1060 (6GB VRAM)</li>
<li>NIC: single VirtIO (paravirtualized) connected to External VLAN</li>
</ul>
<p>Setting up PCIe passthrough in Proxmox was quite tricky.
There are several guides that help you to configure your system.
In Proxmox, it looks as follows:</p>
<figure><img src="/img/homelab24/proxmox-gpu-pass.webp"
    alt="GPU passthrough in Proxmox"><figcaption>
      <p>GPU passthrough in Proxmox</p>
    </figcaption>
</figure>

<p>However, in my case, I had to additionally</p>
<ul>
<li>add <code>video=efifb:off</code> to the kernel parameters of the Proxmox host</li>
<li>use UEFI for the VM to enable the <em>primary GPU</em> option in Proxmox</li>
<li>disable secure boot in the VM</li>
<li>set the VMs CPU model to <code>host</code></li>
</ul>
<p>Afterward, you will have to install the <code> nvidia-container-toolkit</code> so that the docker containers can work with the GPU.</p>
<p><strong>Applications</strong></p>
<ul>
<li>Jellyfin</li>
<li>Ollama</li>
<li>Watchtower</li>
</ul>
<h4 id="jellyfin">
  Jellyfin
  <a href="#jellyfin">§</a>
</h4>

<p>I possess quite an extensive and ever-growing  media collection.
<a href="https://en.wikipedia.org/wiki/Jellyfin">Jellyfin</a> is an open-source media server that allows you to access your media files over a nice web interface that looks similar to Netflix, Spotify, etc. Jellyfin can use the GPU to accelerate media transcoding.</p>
<p>There are also some well-maintained mobile clients, such as <a href="https://github.com/jmshrv/finamp">Finamp</a>, which you can use as a replacement for Spotify on your phone.</p>
<div class="note">
    This VM has the NAS mounted to <code>/mnt/</code>.
</div>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;3.5&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">jellyfin</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> linuxserver/jellyfin
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> jellyfin
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">ports</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#b48ead">80</span><span style="color:#eceff4">:</span><span style="color:#b48ead">8096</span>
</span></span><span style="display:flex;"><span>          - <span style="color:#b48ead">443</span><span style="color:#eceff4">:</span><span style="color:#b48ead">8920</span>
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>          - ./config:/config
</span></span><span style="display:flex;"><span>          - /mnt/media/Video:/media/Video
</span></span><span style="display:flex;"><span>          - /mnt/media/Music:/media/Music
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>          - PUID=1000
</span></span><span style="display:flex;"><span>          - PGID=1000
</span></span><span style="display:flex;"><span>          - NVIDIA_DRIVER_CAPABILITIES=all
</span></span><span style="display:flex;"><span>          - NVIDIA_VISIBLE_DEVICES=all
</span></span><span style="display:flex;"><span>          - JELLYFIN_PublishedServerUrl=xxxx
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;unless-stopped&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">deploy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>          <span style="color:#81a1c1">resources</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>            <span style="color:#81a1c1">reservations</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>              <span style="color:#81a1c1">devices</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>                - <span style="color:#81a1c1">driver</span><span style="color:#eceff4">:</span> nvidia
</span></span><span style="display:flex;"><span>                  <span style="color:#81a1c1">count</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1</span>
</span></span><span style="display:flex;"><span>                  <span style="color:#81a1c1">capabilities</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">[</span>gpu]
</span></span></code></pre></div><h4 id="ollama">
  Ollama
  <a href="#ollama">§</a>
</h4>

<p>Ollama allows you to run LLMs locally with GPU acceleration via CUDA.
The models are then served via an API, which can be neatly integrated with several other projects.
Currently, the LLMs weights are stored on the NAS, which drastically increases the latency for the first request (since the model has to be loaded into the RAM over the network). However, many of the models are just too large for the VM disk.</p>
<div class="note">
    This VM has the NAS mounted to <code>/mnt/</code>.
</div>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#81a1c1">version</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#39;3&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">services</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">ollama</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">image</span><span style="color:#eceff4">:</span> ollama/ollama
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">container_name</span><span style="color:#eceff4">:</span> ollama
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">ports</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#a3be8c">&#34;11434:11434&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">volumes</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - /mnt/app-data/ollama:/root/.ollama
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">restart</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;unless-stopped&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">environment</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      - OLLAMA_KEEP_ALIVE=&#34;60m&#34;
</span></span><span style="display:flex;"><span>      - OLLAMA_LOAD_TIMEOUT=&#34;60m&#34;
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">deploy</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">resources</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>        <span style="color:#81a1c1">reservations</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>          <span style="color:#81a1c1">devices</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#81a1c1">driver</span><span style="color:#eceff4">:</span> nvidia
</span></span><span style="display:flex;"><span>              <span style="color:#81a1c1">count</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1</span>
</span></span><span style="display:flex;"><span>              <span style="color:#81a1c1">capabilities</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">[</span>gpu]
</span></span></code></pre></div><h3 id="home-assistant">
  Home Assistant
  <a href="#home-assistant">§</a>
</h3>

<p><a href="https://en.wikipedia.org/wiki/Home_Assistant">Home Assistant</a> is a very good open source home automation platform. They also provide a VM disk image that can be imported to set everyting up with minimal effort.</p>
<ul>
<li>OS: Home Assistant OS</li>
<li>CPU: 4 vCore</li>
<li>RAM: 4GB</li>
<li>Storage: 32GB Thin Disk</li>
<li>NIC: single VirtIO (paravirtualized) connected to IoT VLAN</li>
</ul>
<h3 id="tor-relay">
  Tor Relay
  <a href="#tor-relay">§</a>
</h3>

<p>After reading <a href="https://en.wikipedia.org/wiki/Permanent_Record_(autobiography)">Permanent Record</a> by Snowden, I decided to donate some of my bandwidth to the <a href="https://en.wikipedia.org/wiki/Tor_(network)">Tor network</a>.
I configured the server to <strong>not</strong> run as an exit node, otherwise, my IP address would find its way on some blacklists, and there might even be legal ramifications.
Rather, the software acts as a relay, transferring messages from one tor node to another.</p>
<ul>
<li>CPU: 1 vCore</li>
<li>RAM: 1GB</li>
<li>Storage: 32GB Thin Disk</li>
<li>NIC: single VirtIO (paravirtualized) connected to External VLAN</li>
</ul>
<p>For security reasons, the tor relay runs in its own VM, in the &ldquo;external services&rdquo; VLAN, so that it is isolated from most other systems in the network.</p>
<p>Over the last year, the node has transferred several terabytes of data.</p>
<figure><img src="/img/homelab24/tor-node.webp"
    alt="Screenshot of the tor node monitoring software Nyx."><figcaption>
      <p>Screenshot of the tor node monitoring software Nyx.</p>
    </figcaption>
</figure>


<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Overall, this post is heavily inspired by <a href="https://blog.networkprofile.org/vms-and-containers-i-am-running-2023/">this one</a>.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded></item><item><title>Training a German LLM from Scratch</title><link>https://www.kkirchheim.de/blog/german-gpt/</link><pubDate>Thu, 14 Nov 2024 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/german-gpt/</guid><description>&lt;div class="warn"&gt;
This article is not finished and will be updated.
&lt;/div&gt;
&lt;p&gt;&lt;span class="dropcap"&gt;T&lt;/span&gt;&lt;span class="dropcap-rest"&gt;he research group &lt;/span&gt; I work with has access to a small GPU cluster, which occasionally sits idle. To avoid wasting valuable compute resources (IDLE GPUs essentially burn money through opportunity costs), I decided to train a German &lt;a href="https://en.wikipedia.org/wiki/GPT-2"&gt;GPT-2-style model&lt;/a&gt; from scratch, using only German text.&lt;/p&gt;
&lt;p&gt;Existing German models available on Hugging Face have 137M parameters and a context length of 1024 tokens&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;, which is quite limited compared to recently released models, such as those in the &lt;a href="https://en.wikipedia.org/wiki/Llama_(language_model)"&gt;LLAMA&lt;/a&gt; family.&lt;/p&gt;</description><content:encoded><![CDATA[<div class="warn">
    This article is not finished and will be updated.
</div>
<p><span class="dropcap">T</span><span class="dropcap-rest">he research group </span> I work with has access to a small GPU cluster, which occasionally sits idle. To avoid wasting valuable compute resources (IDLE GPUs essentially burn money through opportunity costs), I decided to train a German <a href="https://en.wikipedia.org/wiki/GPT-2">GPT-2-style model</a> from scratch, using only German text.</p>
<p>Existing German models available on Hugging Face have 137M parameters and a context length of 1024 tokens<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>, which is quite limited compared to recently released models, such as those in the <a href="https://en.wikipedia.org/wiki/Llama_(language_model)">LLAMA</a> family.</p>
<div class="note">
    <p>After the training and writing the first draft of this article, I became aware of some larger German models, such as the</p>
<ul>
<li><a href="https://arxiv.org/abs/2411.11171">LLäMmlein</a>,</li>
<li><a href="https://huggingface.co/VAGOsolutions">Sauerkraut (by VAGO</a>) and</li>
<li><a href="https://huggingface.co/flair/bueble-lm-2b">Bueble-LM</a> series.</li>
</ul>
<p>While the existence of these larger and more capable models probably means that the one presented here will not be used as much, I still enjoyed the learning experience.</p>

</div>
<p>To make the model at least somewhat competitive with current alternatives, I aimed to support context lengths of at least double that.
I also wanted the model to have more parameters, which generally enhances model quality.
Therefore, I set out to train a GPT-2-style model with 358M parameters and a context window of 2048 tokens.  While still modest compared to state-of-the-art models, it’s an improvement.
The resulting model is available on at <a href="https://huggingface.co/kkirchheim/german-gpt2-medium">kkirchheim/german-gpt2-medium</a>.</p>
<h2 id="dataset">
  Dataset
  <a href="#dataset">§</a>
</h2>

<p>A large dataset is required before training a model. Since this LLM is German-only, it was crucial to ensure that the collected texts were in German.</p>
<h3 id="selection">
  Selection
  <a href="#selection">§</a>
</h3>

<p>While we could have scraped the internet ourselves to gather enough data, this would be a lengthy process, requiring a custom crawler seeded with relevant pages and a substantial runtime.</p>
<p>Thankfully, others have already done this work: <a href="https://commoncrawl.org/">Common Crawl</a> provides a massive text dataset from internet scrapes spanning the past decade. A derivative project, the <a href="https://german-nlp-group.github.io/projects/gc4-corpus.html">German Colossal, Cleaned Common Crawl corpus (GC4)</a>, contains the German subset of the entire Common Crawl. This means that we do not have to download the entire internet and filter for German content manually.</p>
<!-- This dataset also includes quality information about the texts. We selected only the highest-quality texts, such as those from newspapers, government sites, Wikipedia, and similar sources. -->
<p>Since the data was scraped from 2015 to 2020, this will be the knowledge cutoff for our LLM. For context, existing German-only models were trained on just 90GB of text.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></p>
<p>While this dataset is publicly available, which is nice for reproducibility, the fact that it is a collection of scraped data also means that we do not have the licenses. For research purposes, it is allowed to train models on such content.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<h3 id="preparation">
  Preparation
  <a href="#preparation">§</a>
</h3>

<p>To start, we downloaded all the <code>.tar</code> archives listed on the website - around 180GB of compressed text. After extraction, we are left with 300GB of uncompressed, high-quality German text in something similar to JSON format.<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>
We can inspect the resulting files with</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>head de_head_extracted/de_head_0000_2015-48.txt
</span></span></code></pre></div><p>which gives us something like</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#eceff4">{</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;url&#34;</span><span style="color:#eceff4">:</span> <span style="color:#bf616a">...</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;date_download&#34;</span><span style="color:#eceff4">:</span> <span style="color:#bf616a">...</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;length&#34;</span><span style="color:#eceff4">:</span> <span style="color:#bf616a">...</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;nlines&#34;</span><span style="color:#eceff4">:</span> <span style="color:#bf616a">...</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;source_domain&#34;</span><span style="color:#eceff4">:</span> <span style="color:#bf616a">...</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;title&#34;</span><span style="color:#eceff4">:</span> <span style="color:#bf616a">...</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;language&#34;</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;de&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;language_score&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0.99</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;raw_content&#34;</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;Siegmar Gerber Titel:\nAnwendungslösungen zur Simulation von Rechenanlagen auf dem ZRA 1 und zur Bibliographieautomatisierung mit Hilfe des Rechners ODRA Erscheinungsdatum:\nIm Beitrag werden Lösungen für zwei Anwendungsprojekte beschrieben, die in den sechziger Jahren am Institut für Maschinelle Rechentechnik der Leipziger Universität mit Hilfe der Rechenanlagen ZRA 1 bzw. ODRA realisiert wurden.&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">}</span>
</span></span></code></pre></div><p>Here, <code>raw_content</code> is the field that we are interested in, as it contains the extracted text from the scraped websites.
We can use the other fields to get some insights into our dataset, and filter for higher-quality content.</p>
<p>So, first, we filter all entries where the language score &lt; 0.98, which makes sure that our dataset only contains german webpages.
Then, we can investigate the source of the articles, by counting the values of the <code>source_domain</code> column:</p>
<figure><img src="/img/german-gpt/article-origin.webp"
    alt="Article Sources"><figcaption>
      <p>Article Sources</p>
    </figcaption>
</figure>

<p>As we can see, the dataset contains mostly news sites and Wikipedia.
Furthermore, we can inspect the length of the articles:</p>
<figure><img src="/img/german-gpt/article-lengths.webp"
    alt="Distribution of length of articles (in characters)"><figcaption>
      <p>Distribution of length of articles (in characters)</p>
    </figcaption>
</figure>

<p>We store the filtered dataset as JSON, discarding all fields apart from <code>raw_content</code>.
This allows us to directly load the datasets using the Huggingface <code>datasets</code> library:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">from</span> <span style="color:#8fbcbb">datasets</span> <span style="color:#81a1c1;font-weight:bold">import</span> load_dataset
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>dataset <span style="color:#81a1c1">=</span> load_dataset<span style="color:#eceff4">(</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a3be8c">&#39;json&#39;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    cache_dir<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;./cache&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    data_files<span style="color:#81a1c1">=</span><span style="color:#eceff4">[</span><span style="color:#a3be8c">&#39;de_head_extracted/*.json&#39;</span><span style="color:#eceff4">],</span>
</span></span><span style="display:flex;"><span>    split<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;train&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">print</span><span style="color:#eceff4">(</span><span style="color:#a3be8c">f</span><span style="color:#a3be8c">&#34;Length: </span><span style="color:#a3be8c">{</span><span style="color:#81a1c1">len</span><span style="color:#eceff4">(</span>dataset<span style="color:#eceff4">)</span><span style="color:#a3be8c">}</span><span style="color:#a3be8c">&#34;</span><span style="color:#eceff4">)</span>
</span></span></code></pre></div><p>This tells us that there are 117,412,577 texts in total.
After loading everything, the cache will be 1.2T in size.</p>
<h2 id="training">
  Training
  <a href="#training">§</a>
</h2>

<p>Training an LLM involves two main steps: first, creating a tokenizer to map character sequences to tokens that the LLM can process (and vice versa). Second, training the LLM to predict a probability distribution over the next tokens, given preceding tokens in the text.</p>
<h3 id="tokenization">
  Tokenization
  <a href="#tokenization">§</a>
</h3>

<p>Training a tokenizer with Hugging Face is quite straightforward<sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup>, and I gave it a try. However, in the end, I opted to reuse the tokenizer used of <code>stefan-it/german-gpt2-larger</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">from</span> <span style="color:#8fbcbb">transformers</span> <span style="color:#81a1c1;font-weight:bold">import</span> AutoTokenizer
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tokenizer <span style="color:#81a1c1">=</span> AutoTokenizer<span style="color:#81a1c1">.</span>from_pretrained<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;stefan-it/german-gpt2-larger&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>tokenizer<span style="color:#81a1c1">.</span>pad_token <span style="color:#81a1c1">=</span> tokenizer<span style="color:#81a1c1">.</span>eos_token
</span></span></code></pre></div><p>There are better tokenizers available that, as far as I know, differ mainly in how they deal with numerals.</p>
<p>We tokenize the entire dataset, caching the results on disk:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#616e87;font-style:italic"># Tokenize the dataset and count tokens in one step</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">def</span> <span style="color:#88c0d0">tokenize_and_count</span><span style="color:#eceff4">(</span>examples<span style="color:#eceff4">):</span>
</span></span><span style="display:flex;"><span>    tokenized <span style="color:#81a1c1">=</span> tokenizer<span style="color:#eceff4">(</span>
</span></span><span style="display:flex;"><span>        examples<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;raw_content&#34;</span><span style="color:#eceff4">],</span>
</span></span><span style="display:flex;"><span>        truncation<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>        max_length<span style="color:#81a1c1">=</span><span style="color:#b48ead">2048</span>
</span></span><span style="display:flex;"><span>    <span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>    tokenized<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;num_tokens&#34;</span><span style="color:#eceff4">]</span> <span style="color:#81a1c1">=</span> <span style="color:#eceff4">[</span><span style="color:#81a1c1">len</span><span style="color:#eceff4">(</span>t<span style="color:#eceff4">)</span> <span style="color:#81a1c1;font-weight:bold">for</span> t <span style="color:#81a1c1;font-weight:bold">in</span> tokenized<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;input_ids&#34;</span><span style="color:#eceff4">]]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1;font-weight:bold">return</span> tokenized
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#616e87;font-style:italic"># Tokenize and count in a single step</span>
</span></span><span style="display:flex;"><span>tokenized_dataset <span style="color:#81a1c1">=</span> dataset<span style="color:#81a1c1">.</span>map<span style="color:#eceff4">(</span>
</span></span><span style="display:flex;"><span>    tokenize_and_count<span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    batched<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    num_proc<span style="color:#81a1c1">=</span><span style="color:#b48ead">128</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    cache_file_name<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;cache-tokenized/.tokenized_dataset_cache&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>total_tokens <span style="color:#81a1c1">=</span> <span style="color:#81a1c1">sum</span><span style="color:#eceff4">(</span>tokenized_dataset<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#34;num_tokens&#34;</span><span style="color:#eceff4">])</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">print</span><span style="color:#eceff4">(</span><span style="color:#a3be8c">f</span><span style="color:#a3be8c">&#34;Total number of tokens: </span><span style="color:#a3be8c">{</span>total_tokens<span style="color:#a3be8c">}</span><span style="color:#a3be8c">&#34;</span><span style="color:#eceff4">)</span>
</span></span></code></pre></div><p>This tells us that the entire dataset has 66,537,920,947 tokens.
The <code>num_procs=128</code> parameter significantly speeds up the process, from 24h to &lt; 1h.
We can then split the dataset into a training and a validation portion.</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#616e87;font-style:italic"># Split the dataset into train and validation sets</span>
</span></span><span style="display:flex;"><span>train_val_split <span style="color:#81a1c1">=</span> tokenized_dataset<span style="color:#81a1c1">.</span>train_test_split<span style="color:#eceff4">(</span>test_size<span style="color:#81a1c1">=</span><span style="color:#b48ead">0.0001</span><span style="color:#eceff4">)</span> 
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#616e87;font-style:italic"># Get the train and validation sets</span>
</span></span><span style="display:flex;"><span>train_dataset <span style="color:#81a1c1">=</span> train_val_split<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#39;train&#39;</span><span style="color:#eceff4">]</span>
</span></span><span style="display:flex;"><span>val_dataset <span style="color:#81a1c1">=</span> train_val_split<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#39;test&#39;</span><span style="color:#eceff4">]</span>
</span></span></code></pre></div><h3 id="model-configuration">
  Model Configuration
  <a href="#model-configuration">§</a>
</h3>

<p>As described earlier, we want to train a gpt-medium-model, but with increased context size.
How do we do this?</p>
<p>In Huggingface, models are described by <code>config.json</code> configuration files that parameterize the architecture.
The original configuration for a gpt2-medium looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#eceff4">{</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;activation_function&#34;</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;gelu_new&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;architectures&#34;</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">[</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a3be8c">&#34;GPT2LMHeadModel&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#eceff4">],</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;attn_pdrop&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0.1</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;bos_token_id&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">50256</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;embd_pdrop&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0.1</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;eos_token_id&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">50256</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;initializer_range&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0.02</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;layer_norm_epsilon&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1e-05</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;model_type&#34;</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;gpt2&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;n_ctx&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1024</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;n_embd&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1024</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;n_head&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">16</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;n_layer&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">24</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;n_positions&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">1024</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;n_special&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;predict_special_tokens&#34;</span><span style="color:#eceff4">:</span> <span style="color:#81a1c1;font-weight:bold">true</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;resid_pdrop&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0.1</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;summary_activation&#34;</span><span style="color:#eceff4">:</span> <span style="color:#81a1c1;font-weight:bold">null</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;summary_first_dropout&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">0.1</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;summary_proj_to_labels&#34;</span><span style="color:#eceff4">:</span> <span style="color:#81a1c1;font-weight:bold">true</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;summary_type&#34;</span><span style="color:#eceff4">:</span> <span style="color:#a3be8c">&#34;cls_index&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;summary_use_proj&#34;</span><span style="color:#eceff4">:</span> <span style="color:#81a1c1;font-weight:bold">true</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;task_specific_params&#34;</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">{</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1">&#34;text-generation&#34;</span><span style="color:#eceff4">:</span> <span style="color:#eceff4">{</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">&#34;do_sample&#34;</span><span style="color:#eceff4">:</span> <span style="color:#81a1c1;font-weight:bold">true</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>      <span style="color:#81a1c1">&#34;max_length&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">50</span>
</span></span><span style="display:flex;"><span>    <span style="color:#eceff4">}</span>
</span></span><span style="display:flex;"><span>  <span style="color:#eceff4">},</span>
</span></span><span style="display:flex;"><span>  <span style="color:#81a1c1">&#34;vocab_size&#34;</span><span style="color:#eceff4">:</span> <span style="color:#b48ead">50257</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">}</span>
</span></span></code></pre></div><p>The documentation for these hyperparameters is <a href="https://huggingface.co/transformers/v2.4.0/_modules/transformers/configuration_gpt2.html">here</a>.
There are a couple of modifications that we have to make:</p>
<ul>
<li><strong>n_positions</strong>: the maximum number of tokens that the model can be used with, which we adjust to 2048</li>
<li><strong>n_ctx</strong>: this is the actual context length, so we set it to 2048 as well.</li>
</ul>
<p>We then put the modified <code>config.json</code> this into a directory called <code>mymodel</code> and create the model with:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>cfg <span style="color:#81a1c1">=</span> GPT2Config<span style="color:#81a1c1">.</span>from_pretrained<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;mymodel&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>model <span style="color:#81a1c1">=</span> GPT2LMHeadModel<span style="color:#eceff4">(</span>cfg<span style="color:#eceff4">)</span>
</span></span></code></pre></div><h3 id="optimization">
  Optimization
  <a href="#optimization">§</a>
</h3>

<p>After everything is set up, we can use the Hugging Face API to train the model. The API makes this extremely convenient.</p>
<p>Given the corpus size and the limited resources, I only trained for a single epoch.</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>date_time <span style="color:#81a1c1">=</span> datetime<span style="color:#81a1c1">.</span>now<span style="color:#eceff4">()</span><span style="color:#81a1c1">.</span>strftime<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;%m</span><span style="color:#a3be8c">%d</span><span style="color:#a3be8c">%Y-%H-%M-%S&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>training_args <span style="color:#81a1c1">=</span> TrainingArguments<span style="color:#eceff4">(</span>
</span></span><span style="display:flex;"><span>    output_dir<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#39;./results&#39;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    num_train_epochs<span style="color:#81a1c1">=</span><span style="color:#b48ead">1</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    learning_rate<span style="color:#81a1c1">=</span><span style="color:#b48ead">6e-4</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    per_device_train_batch_size<span style="color:#81a1c1">=</span><span style="color:#b48ead">12</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    gradient_accumulation_steps<span style="color:#81a1c1">=</span><span style="color:#b48ead">12</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    per_device_eval_batch_size<span style="color:#81a1c1">=</span><span style="color:#b48ead">12</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    gradient_checkpointing<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    warmup_steps<span style="color:#81a1c1">=</span><span style="color:#b48ead">1000</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    torch_compile<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">False</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    weight_decay<span style="color:#81a1c1">=</span><span style="color:#b48ead">0.1</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    logging_dir<span style="color:#81a1c1">=</span><span style="color:#a3be8c">f</span><span style="color:#a3be8c">&#39;./logs/</span><span style="color:#a3be8c">{</span>date_time<span style="color:#a3be8c">}</span><span style="color:#a3be8c">&#39;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    logging_strategy<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;steps&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    disable_tqdm<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">False</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    report_to<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;tensorboard&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    save_total_limit <span style="color:#81a1c1">=</span> <span style="color:#b48ead">3</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    logging_steps<span style="color:#81a1c1">=</span><span style="color:#b48ead">10</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    fp16<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    ddp_find_unused_parameters<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">False</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    dataloader_num_workers<span style="color:#81a1c1">=</span><span style="color:#b48ead">32</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    optim<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;adamw_torch&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    resume_from_checkpoint<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    eval_strategy<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;steps&#34;</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    eval_steps<span style="color:#81a1c1">=</span><span style="color:#b48ead">100</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>cb <span style="color:#81a1c1">=</span> TextGenerationCallback<span style="color:#eceff4">(</span>tokenizer<span style="color:#81a1c1">=</span>tokenizer<span style="color:#eceff4">,</span> log_dir<span style="color:#81a1c1">=</span><span style="color:#a3be8c">f</span><span style="color:#a3be8c">&#34;./logs/</span><span style="color:#a3be8c">{</span>date_time<span style="color:#a3be8c">}</span><span style="color:#a3be8c">&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>trainer <span style="color:#81a1c1">=</span> Trainer<span style="color:#eceff4">(</span>
</span></span><span style="display:flex;"><span>    model<span style="color:#81a1c1">=</span>model<span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    args<span style="color:#81a1c1">=</span>training_args<span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    train_dataset<span style="color:#81a1c1">=</span>train_dataset<span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    eval_dataset<span style="color:#81a1c1">=</span>val_dataset<span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    data_collator<span style="color:#81a1c1">=</span>data_collator<span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>    callbacks<span style="color:#81a1c1">=</span><span style="color:#eceff4">[</span>cb<span style="color:#eceff4">]</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>trainer<span style="color:#81a1c1">.</span>train<span style="color:#eceff4">()</span>
</span></span></code></pre></div><p>We run the training script with</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>torchrun --nproc_per_node <span style="color:#b48ead">4</span> train.py
</span></span></code></pre></div><p>If the training crashes, you can resume by using</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>trainer<span style="color:#81a1c1">.</span>train<span style="color:#eceff4">(</span>resume_from_checkpoint<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;results/checkpoint-xxx&#34;</span><span style="color:#eceff4">)</span>
</span></span></code></pre></div><h3 id="monitoring">
  Monitoring
  <a href="#monitoring">§</a>
</h3>

<p>Once the training runs, we can use different tools to monitor the process.</p>
<h4 id="nvtop">
  nvtop
  <a href="#nvtop">§</a>
</h4>

<p><code>nvtop</code> displays the utilization of the GPUs.</p>
<figure><img src="/img/german-gpt/nvtop.webp"
    alt="GPUs go BRRRRR"><figcaption>
      <p>GPUs go BRRRRR</p>
    </figcaption>
</figure>

<p>This way, we can, for example, determine whether the process allocates sufficient VRAM or if there is still space to increase the batch size.</p>
<h4 id="tensorboard">
  Tensorboard
  <a href="#tensorboard">§</a>
</h4>

<p>The trainer prints statistics to the terminal at regular intervals. However, Tensorboard provides a web interface to watch training statistics in real time.
Tensorboard can be enabled by the <code>report_to=&quot;tensorboard&quot;</code> argument in the training configuration.
The web interface can then be launched by executing:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>tensorboard --logdir logs/
</span></span></code></pre></div><figure><img src="/img/german-gpt/tensorboard.webp"
    alt="Live training statistics in Tensorboard"><figcaption>
      <p>Live training statistics in Tensorboard</p>
    </figcaption>
</figure>

<p>By implementing a custom <code>TextGenerationCallback</code>, we can sample from the GPT during training.</p>
<figure><img src="/img/german-gpt/tensorboard2.webp"
    alt="Live text samples in Tensorboard"><figcaption>
      <p>Live text samples in Tensorboard</p>
    </figcaption>
</figure>

<h4 id="plotting">
  Plotting
  <a href="#plotting">§</a>
</h4>

<p>We can also download statistics in JSON format from Tensorboard to process them programmatically.</p>
<p>The loss curve over the training period is shown below. Aside from some initial spikes, it follows the expected pattern: a sharp loss drop at first, followed by a gradual decrease as training progresses.</p>
<figure><img src="/img/german-gpt/loss.webp"
    alt="Loss over Training. The gaps in the data indicate crashes of the training script."><figcaption>
      <p>Loss over Training. The gaps in the data indicate crashes of the training script.</p>
    </figcaption>
</figure>

<h4 id="gradient-norm-spikes">
  Gradient Norm Spikes
  <a href="#gradient-norm-spikes">§</a>
</h4>

<p>During training, we can observe an interesting phenomenon: when we look at the norm of the gradient of the loss
$\lVert \nabla_{\theta} \mathcal{L}(x, y) \rVert$ w.r.t. the models weights $\theta$, we see (plot below) that</p>
<ol>
<li>they start at around $1$ and then quickly decrease.  However, we observe some spikes, particularly in early epochs.
These spikes also correlate with some drastic jumps in the model&rsquo;s loss (see image above).</li>
<li>we can see that the gradient norm increases towards the end of the epoch.</li>
</ol>
<p>This magnitude tells us something about how large the updates are that we apply to the model&rsquo;s weights.
It makes intuitive sense to me that we start out with quite large updates at the beginning of the training, and then gradually move towards the minimum of the loss in smaller steps as it becomes more difficult to improve the loss, so the gradient is not as steep.</p>
<p>However, to be honest, I do not know why we observe these jumps and the gradual increase towards the end of the epoch.
If you have any suggestions, feel free to contact me.</p>
<figure><img src="/img/german-gpt/grad-norm.webp"
    alt="Spikes in the norm of the gradient"><figcaption>
      <p>Spikes in the norm of the gradient</p>
    </figcaption>
</figure>

<h2 id="evaluation">
  Evaluation
  <a href="#evaluation">§</a>
</h2>

<p>Now that the model is trained, how can we evaluate it?</p>
<h3 id="qualitative">
  Qualitative
  <a href="#qualitative">§</a>
</h3>

<p>One of the first things we can do to assess how good (or bad) the model is, is to simply have a look at some example generations.
For example, we make the model complete the following text:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">from</span> <span style="color:#8fbcbb">transformers</span> <span style="color:#81a1c1;font-weight:bold">import</span> pipeline
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>pipe <span style="color:#81a1c1">=</span> pipeline<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;text-generation&#34;</span><span style="color:#eceff4">,</span> model<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;kkirchheim/german-gpt2-medium&#34;</span><span style="color:#eceff4">,</span> device<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;cuda&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>text <span style="color:#81a1c1">=</span> pipe<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;Der Sinn des Lebens ist&#34;</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>            max_length<span style="color:#81a1c1">=</span><span style="color:#b48ead">256</span><span style="color:#eceff4">,</span>  
</span></span><span style="display:flex;"><span>            no_repeat_ngram_size<span style="color:#81a1c1">=</span><span style="color:#b48ead">3</span><span style="color:#eceff4">,</span>  
</span></span><span style="display:flex;"><span>            top_k<span style="color:#81a1c1">=</span><span style="color:#b48ead">50</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>            top_p<span style="color:#81a1c1">=</span><span style="color:#b48ead">0.95</span><span style="color:#eceff4">,</span>
</span></span><span style="display:flex;"><span>            do_sample<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span>
</span></span><span style="display:flex;"><span> <span style="color:#eceff4">)[</span><span style="color:#b48ead">0</span><span style="color:#eceff4">][</span><span style="color:#a3be8c">&#34;generated_text&#34;</span><span style="color:#eceff4">]</span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1">print</span><span style="color:#eceff4">(</span>text<span style="color:#eceff4">)</span>
</span></span></code></pre></div><p>The result looks like this:
<div class="quote">
    <p>Der Sinn des Lebens ist der Weg in die Freiheit, die wir für uns und unsere Kinder anstreben.</p>
<p>Das Wichtigste dabei ist es, die richtige Entscheidung für eine Lebens- und Sozialform zu treffen.</p>
<p>Denn nur wenn Kinder in einer Familie aufwachsen und Familie ihr Leben selbst bestimmen können, werden sie sich auch in Zukunft in ihrer Persönlichkeit verwirklichen.
Wie wichtig es ist, in einer gesunden und lebenswerten Umwelt zu leben, zeigt sich am großen Anteil von älteren Menschen. Der demografische Wandel ist in vielen Bereichen bereits spürbar und wird viele Menschen immer stärker belasten.</p>
<p>Viele Familien in unserer Region leben seit Generationen im Eigentum. Sie sind in einem Generationenverbund mit ihren Kindern mit ihren eigenen Bedürfnissen und Ideen an den Ort ihrer Wohnumgebung gebunden. Die Generation der Jüngeren lebt zu einem Großteil allein in einer kleinen Wohnsiedlung ohne eigenen Garten, im Altersheim oder als alleinstehende Rentnerin oder Rentner.</p>
<p>Die Lebensbereiche Wohnen, Familie und Gesellschaft rücken in dieser Situation in den Fokus der Gesellschaft und erfordern die Entwicklung von neuen gesellschaftlichen, sozialen und ökonomischen Lebensmodellen.</p>
<p>Mit unserer Gesellschaft und unseren Kindern ist es oft nicht mehr so einfach wie früher, in dieser Lebensphase, sich von einer festen Bindung in die neuen Lebensphase zu lösen.
Neue soziale Systeme müssen deshalb ganz neu entwickelt werden, um</p>

</div></p>
<p>While this reads strange, at times, it does resemble valid German text.</p>
<h3 id="language-modeling">
  Language Modeling
  <a href="#language-modeling">§</a>
</h3>

<p>For English models, there is a plethora of benchmarks that evaluate all kinds of properties of the model, such as its reasoning abilities, its knowledge in certain fields, or its truthfulness. However, for German text, our choices are quite limited.
However, what you can always do is to compare the losses of different models on the same corpus.
This will give you an idea of how well the models can predict the next token.
Instead of comparing the loss, people often compare the <a href="https://en.wikipedia.org/wiki/Perplexity">per-token-perplexity</a>, which is a measure of how perplexed the model is by a given text.
Perplexity over a sequence of tokens $w$ with length $N$ is computed as:
$$ PPL(w) = \exp \left(  -\frac{1}{N}  \sum_{i=1}^{N} \log(p_{\theta}(w_i \mid w_1, &hellip;, w_{i-1})) \right)  $$
so, in essence, it is the exponentiated loss.<sup id="fnref:6"><a href="#fn:6" class="footnote-ref" role="doc-noteref">6</a></sup> In practice, perplexity is often only approximated, as computing it exactly requires $N$ forward passes, which can take a very long time for larger corpora.</p>
<p>There are several implementations of the perplexity metric available online, and interestingly, many of them give slightly different results.
So, I went with the implementation of higgingface evaluate, which I only modified slightly, because it would throw an error for some of the models.</p>
<p>For the evaluation, we took the first 10k articles of german Wikipedia.</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">from</span> <span style="color:#8fbcbb">datasets</span> <span style="color:#81a1c1;font-weight:bold">import</span> load_dataset
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>dataset <span style="color:#81a1c1">=</span> load_dataset<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;wikipedia&#34;</span><span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;20220301.de&#34;</span><span style="color:#eceff4">,</span> split<span style="color:#81a1c1">=</span><span style="color:#a3be8c">&#34;train&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>text <span style="color:#81a1c1">=</span> <span style="color:#eceff4">[</span>sample<span style="color:#eceff4">[</span><span style="color:#a3be8c">&#39;text&#39;</span><span style="color:#eceff4">]</span> <span style="color:#81a1c1;font-weight:bold">for</span> n<span style="color:#eceff4">,</span> sample <span style="color:#81a1c1;font-weight:bold">in</span> <span style="color:#81a1c1">enumerate</span><span style="color:#eceff4">(</span>dataset<span style="color:#eceff4">)</span> <span style="color:#81a1c1;font-weight:bold">if</span> n <span style="color:#81a1c1">&lt;</span> <span style="color:#b48ead">10000</span><span style="color:#eceff4">]</span>
</span></span></code></pre></div><p><div class="warn">
    We can safely assume that the german wikipedia was part of the models training dataset. However, this wikipedia dump is from 2022, while our models training data only includes scraped websites up until 2020.
</div>
You can find the resulting perplexity values below:</p>
<figure><img src="/img/german-gpt/perplexity.webp"
    alt="Perplexity of different models on some test data"><figcaption>
      <p>Perplexity of different models on some test data</p>
    </figcaption>
</figure>

<p>As you can see, the LLama model outperforms ours, which is unsurprising, given that it has over $20 \times$ the number of parameters.
Our model, on the other hand, outperforms the smaller German models (also, unsurprisingly, as it is larger and was trained on much more data).
It should be noted that per-token perplexity can be difficult to compare between models with different tokenizers, so I am not entirely sure how to interpret the performance difference to LLama3.
However, the German models all use the same tokenizer.</p>
<p>I am not entirely sure why the <code>stefan-it</code> model performs so poorly.
According to the model card, it is basically a variant of the <code>dbmdz</code> model trained on much more data, so you would expect it to perform better.</p>
<h3 id="memory-footprint">
  Memory Footprint
  <a href="#memory-footprint">§</a>
</h3>

<p>Model quantization can be used to reduce the VRAM required for inference. The table below shows the maximum required GPU memory for generating 1024 tokens.
As we can see, our model requires more RAM compared to the model of <code>stefan-it</code> (and, similarly, <code>dbmdz</code>, which has the same architecture), but is still significantly less RAM intensive than the Llama model.</p>
<table>
  <caption>VRAM Usage Comparison in MB (1k Tokens)</caption>
  <tr>
    <th>Quantization Level</th>
    <th>Ours</th>
    <th>stefan-it/<br>german-gpt2-larger</th>
    <th>meta-llama/<br>Meta-Llama-3-8B</th>
  </tr>
  <tr>
    <td>fp32</td>
    <td>2242.32</td>
    <td>641.44</td>
    <td>31218.98</td>
  </tr>
  <tr>
    <td>fp16</td>
    <td>1174.63</td>
    <td>341.87</td>
    <td>15614.70</td>
  </tr>
  <tr>
    <td>int8</td>
    <td>910.54</td>
    <td>260.32</td>
    <td>8970.10</td>
  </tr>
  <tr>
    <td>int4</td>
    <td>771.42</td>
    <td>219.27</td>
    <td>6126.09</td>
  </tr>
</table>
<h3 id="inference-speed">
  Inference Speed
  <a href="#inference-speed">§</a>
</h3>

<h4 id="full-precision">
  Full Precision
  <a href="#full-precision">§</a>
</h4>

<p>Measuring the time that each model requires to generate 1k tokens on an A100 reveals that our model is approximately two times slower compared to the smaller  <code>stefan-it</code>, but still twice as fast as the Llama model.</p>
<figure><img src="/img/german-gpt/inference-speed-noquant.webp"
    alt="Generated tokens per second on an A100"><figcaption>
      <p>Generated tokens per second on an A100</p>
    </figcaption>
</figure>

<h4 id="quantization">
  Quantization
  <a href="#quantization">§</a>
</h4>

<p>While one could assume that quantization also accelerates inference (as I did), this does not seem to be the case.
Below, you can see a histogram depicting the distribution of time required to sample 1024 tokens from our model on an A100.
We use histograms since this allows us to additionally inspect the distribution of values.</p>
<figure><img src="/img/german-gpt/inference-speed.webp"
    alt="Time required to generate 1024 tokens with different levels of quantizations on an A100"><figcaption>
      <p>Time required to generate 1024 tokens with different levels of quantizations on an A100</p>
    </figcaption>
</figure>

<h2 id="lessons-learned">
  Lessons Learned
  <a href="#lessons-learned">§</a>
</h2>

<p>Throughout collecting data, implementing the training script and finally evaluating the model, there were several lessons which I learned.</p>
<p><strong>Crashes Happen</strong>
You might have noticed gaps in the previous plots. One key lesson I learned is that training can unexpectedly be interrupted, even when there’s no apparent reason. For instance, if the disk becomes full and the Hugging Face Trainer tries to save a new model checkpoint, it crashes. Without prior checkpointing, this can mean a lot of wasted compute.</p>
<p><strong>Batch-Size Matters</strong>
Initially, I started training the model with a moderate batch-size, however, it turns out that this leads to a loss plateau early on.
In my search for solutions to this problem, I had a look the the hyperparameters in Kaparthys <a href="https://github.com/karpathy/nanoGPT">Nano GPT</a> and noticed that this implementation uses much larger batch sizes.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>To my knowledge, the largest and best purely German models are <a href="https://huggingface.co/dbmdz/german-gpt2">dbmdz/german-gpt2</a> and <a href="https://huggingface.co/stefan-it/german-gpt2-larger">stefan-it/german-gpt2-larger</a>. The latter is trained on the same corpus, but only on 90GB of the CommonCrawl.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>According to the information provided on huggingface.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>Concerning the EU AI act, which will be enacted soon, this is still legal for research purposes in Europe. I assume that the EU AI Act is the reason that some recently released LLAMA models are not available in the EU: Meta does not want to get sued.&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>The format is not exactly JSON, but serialized Python. On the common-crawl website, there is example code that demonstrates how to load data in this format.&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:5">
<p>A tutorial is provided <a href="https://huggingface.co/blog/how-to-train">here</a>&#160;<a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:6">
<p>There is an excellent post on perplexity available on <a href="https://thegradient.pub/understanding-evaluation-metrics-for-language-models/">the Gradient</a>. There is also a paper describing <a href="https://arxiv.org/abs/2106.00085">alternative evaluation strategies</a>.&#160;<a href="#fnref:6" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded></item><item><title>Language Models as Reasoners for Out-of-Distribution Detection</title><link>https://www.kkirchheim.de/papers/llm-ood/</link><pubDate>Tue, 17 Sep 2024 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/llm-ood/</guid><description>&lt;p&gt;Our paper, &lt;strong&gt;Language Models as Reasoners for Out-of-Distribution Detection&lt;/strong&gt;, was presented at the &lt;a href="https://www.waise.org/"&gt;Workshop on AI Safety Engineering&lt;/a&gt; (WAISE) 2024 and received the best paper award by popular vote.&lt;/p&gt;
&lt;p&gt;It constitutes an extension of our idea of &lt;a href="https://www.kkirchheim.de/papers/logic-ood/"&gt;Out-of-Distribution Detection with Logical Reasoning&lt;/a&gt;, where we replaced the prolog-based reasoning component with an LLM.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Deep neural networks (DNNs) are prone to making wrong predictions with high confidence for data that does not stem from their training distribution. Consequentially, out-of-distribution (OOD) detection is important in safety-critical applications, as it identifies such inputs. Using prior knowledge about the training distribution through formal constraints has shown promise in enhancing OOD detection. However, developing and maintaining formal knowledge bases can be cumbersome. Large language models (LLMs) have recently excelled in various natural language processing tasks. In this study, we investigate the use of LLMs for OOD detection, where domain constraints are expressed in natural language. Our results indicate that LLMs can outperform random guessing by leveraging general world knowledge learned during training. Moreover, LLMs can par with methods based on formal constraints when supplemented with domain-specific constraints articulated in natural language.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper, <strong>Language Models as Reasoners for Out-of-Distribution Detection</strong>, was presented at the <a href="https://www.waise.org/">Workshop on AI Safety Engineering</a> (WAISE) 2024 and received the best paper award by popular vote.</p>
<p>It constitutes an extension of our idea of <a href="/papers/logic-ood/">Out-of-Distribution Detection with Logical Reasoning</a>, where we replaced the prolog-based reasoning component with an LLM.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Deep neural networks (DNNs) are prone to making wrong predictions with high confidence for data that does not stem from their training distribution. Consequentially, out-of-distribution (OOD) detection is important in safety-critical applications, as it identifies such inputs. Using prior knowledge about the training distribution through formal constraints has shown promise in enhancing OOD detection. However, developing and maintaining formal knowledge bases can be cumbersome. Large language models (LLMs) have recently excelled in various natural language processing tasks. In this study, we investigate the use of LLMs for OOD detection, where domain constraints are expressed in natural language. Our results indicate that LLMs can outperform random guessing by leveraging general world knowledge learned during training. Moreover, LLMs can par with methods based on formal constraints when supplemented with domain-specific constraints articulated in natural language.</p>
<h3 id="presentation">
  Presentation
  <a href="#presentation">§</a>
</h3>

<p>The presentation slides are available <a href="/pdf/llm-ood-presentation.pdf">here</a>.</p>
]]></content:encoded></item><item><title>Deep learning-based harmonization and super-resolution of Landsat-8 and Sentinel-2 images</title><link>https://www.kkirchheim.de/papers/deep-harmonization/</link><pubDate>Fri, 17 May 2024 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/deep-harmonization/</guid><description>&lt;p&gt;Our paper &lt;strong&gt;Deep learning-based harmonization and super-resolution of Landsat-8 and Sentinel-2 images&lt;/strong&gt;, which is based on the masters thesis of my colleague Venkatesh Thirugnana Sambandham, has been published in the ISPRS Journal of Photogrammetry and Remote Sensing. This work is an extension of our previous workshop paper on &lt;a href="https://www.kkirchheim.de/papers/transformer-for-satelite-homogenization/"&gt;transformers for satellite homogenization&lt;/a&gt;.
In summary, we find that a simple UNet model provides surprisingly good performance for the satellite homogenization task.&lt;/p&gt;
&lt;p&gt;We demonstrate that this 100M parameter model&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper <strong>Deep learning-based harmonization and super-resolution of Landsat-8 and Sentinel-2 images</strong>, which is based on the masters thesis of my colleague Venkatesh Thirugnana Sambandham, has been published in the ISPRS Journal of Photogrammetry and Remote Sensing. This work is an extension of our previous workshop paper on <a href="/papers/transformer-for-satelite-homogenization/">transformers for satellite homogenization</a>.
In summary, we find that a simple UNet model provides surprisingly good performance for the satellite homogenization task.</p>
<p>We demonstrate that this 100M parameter model</p>
<ul>
<li>can enhance the spatial resolution of satellite images</li>
<li>is able to increase the availability of cloud-free images by 21% on average</li>
<li>can thereby provide benefits for downstream tasks, like crop segmentation</li>
<li>generalizes well to different regions of the world</li>
<li>is able to provide uncertainty estimates</li>
</ul>
<p>The model is also available on Huggingface, so you can easily test it on your own images:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">from</span> <span style="color:#8fbcbb">transformers</span> <span style="color:#81a1c1;font-weight:bold">import</span> AutoModel
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>model <span style="color:#81a1c1">=</span> AutoModel<span style="color:#81a1c1">.</span>from_pretrained<span style="color:#eceff4">(</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a3be8c">&#34;venkatesh-thiru/s2l8h-UNet-5depth-upsample&#34;</span><span style="color:#eceff4">,</span> 
</span></span><span style="display:flex;"><span>    trust_remote_code<span style="color:#81a1c1">=</span><span style="color:#81a1c1;font-weight:bold">True</span>
</span></span><span style="display:flex;"><span><span style="color:#eceff4">)</span>
</span></span></code></pre></div><h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Multi-spectral satellite images of the Earth’s surface are used in various applications, from water quality assessment and urban planning to climate monitoring, disaster response, infrastructure oversight, and agricultural surveillance. Many of these applications would benefit from higher spatial and temporal resolution of observations, which could be achieved by combining observations from several sources. This study introduces a deep learning-based pipeline to harmonize the spectral and spatial discrepancies between the Landsat-8 and Sentinel-2 Earth Observation satellites. Through established image quality metrics, we demonstrate a significant enhancement in the spatial resolution of Landsat-8 images. Field observation experiments show that leveraging unified images from both satellites increases the availability of cloud-free images by 21% annually on average in our study area. Additionally, our pipeline enhances the Normalized Difference Vegetation Index (NDVI) correlation between Landsat-8 and Sentinel-2 observations by about 4.9%, offering significant performance gains in a downstream crop segmentation task. Our 100M parameter model, trained on European data, generalizes to most regions with only minor limitations. Furthermore, we show that the pipeline can provide uncertainty estimates for its outputs, which are valuable for decision-making in downstream applications.</p>
<figure><img src="/img/satelite-harmo/example.webp"
    alt="Satelite images upsampled by our model compared to baseline"><figcaption>
      <p>Satelite images upsampled by our model compared to baseline</p>
    </figcaption>
</figure>

]]></content:encoded></item><item><title>Out-of-Distribution Detection with Logical Reasoning</title><link>https://www.kkirchheim.de/papers/logic-ood/</link><pubDate>Thu, 04 Jan 2024 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/logic-ood/</guid><description>&lt;p&gt;Our paper &lt;strong&gt;Out-of-Distribution Detction with Logical Reasoning&lt;/strong&gt; has been accepted on the WACV 2024.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Machine Learning models often only generalize reliably
to samples from the training distribution. Consequentially,
detecting when input data is out-of-distribution (OOD) is
crucial, especially in safety-critical applications. Current
OOD detection methods, however, tend to be domain agnostic and often fail to incorporate valuable prior knowledge
about the structure of the training distribution. To address
this limitation, we introduce a novel, hybrid OOD detection
algorithm that combines a deep learning-based perception
system with a first-order logic-based knowledge representation. A logical reasoning system uses this knowledge base
at run-time to infer whether inputs are consistent with prior
knowledge about the training distribution. In contrast to
purely neural systems, the structured knowledge representation allows humans to inspect and modify the rules that
govern the OOD detectors’ behavior. This not only enhances
performance but also fosters a level of explainability that is
particularly beneficial in safety-critical contexts. We demon-
strate the effectiveness of our method through experiments
on several datasets and discuss advantages and limitations.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper <strong>Out-of-Distribution Detction with Logical Reasoning</strong> has been accepted on the WACV 2024.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Machine Learning models often only generalize reliably
to samples from the training distribution. Consequentially,
detecting when input data is out-of-distribution (OOD) is
crucial, especially in safety-critical applications. Current
OOD detection methods, however, tend to be domain agnostic and often fail to incorporate valuable prior knowledge
about the structure of the training distribution. To address
this limitation, we introduce a novel, hybrid OOD detection
algorithm that combines a deep learning-based perception
system with a first-order logic-based knowledge representation. A logical reasoning system uses this knowledge base
at run-time to infer whether inputs are consistent with prior
knowledge about the training distribution. In contrast to
purely neural systems, the structured knowledge representation allows humans to inspect and modify the rules that
govern the OOD detectors’ behavior. This not only enhances
performance but also fosters a level of explainability that is
particularly beneficial in safety-critical contexts. We demon-
strate the effectiveness of our method through experiments
on several datasets and discuss advantages and limitations.</p>
<h3 id="video">
  Video
  <a href="#video">§</a>
</h3>

<p>Below, you can find the presentation video I created for the conference. I used OpenAIs API for writing the script, as well as voice synthesis.
Overall costs of production were $0.15.</p>

<video class="video-shortcode" preload="auto" controls>
    <source src="/video/wacv24.mp4" type="video/mp4">
    There should have been a video here but your browser does not seem
    to support it.
</video>

]]></content:encoded></item><item><title>Towards Deep Anomaly Detection with Structured Knowledge Representations</title><link>https://www.kkirchheim.de/papers/sumnist/</link><pubDate>Thu, 15 Jun 2023 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/sumnist/</guid><description>&lt;p&gt;My paper &lt;strong&gt;Towards Deep Anomaly Detection with Structured Knowledge Representations&lt;/strong&gt; has been accepted on the Workshop on AI Safety Engineering at SafeComp.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Machine learning models tend to only make reliable predictions for inputs that are similar to the training data.
Consequentially, anomaly detection, which can be used to detect unusual inputs, is critical for ensuring the safety of machine learning agents operating in open environments. In this work, we identify and discuss several limitations of current anomaly detection methods, such as their weak performance on tasks that require abstract reasoning, the inability to integrate background knowledge, and the opaqueness that undermines their trustworthiness in critical applications. Furthermore, we propose an architecture for anomaly detection models that aims to integrate structured knowledge representations to address these limitations. Our hypothesis is that this approach can improve performance and robustness, reduce the required resources (such as data and computation), and provide a higher degree of transparency. As a result, our work contributes to the increased safety of machine learning systems.&lt;/p&gt;</description><content:encoded><![CDATA[<p>My paper <strong>Towards Deep Anomaly Detection with Structured Knowledge Representations</strong> has been accepted on the Workshop on AI Safety Engineering at SafeComp.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Machine learning models tend to only make reliable predictions for inputs that are similar to the training data.
Consequentially, anomaly detection, which can be used to detect unusual inputs, is critical for ensuring the safety of machine learning  agents operating in open environments. In this work, we identify and discuss several limitations of current anomaly detection methods, such as their weak performance on tasks that require abstract reasoning, the inability to integrate background knowledge, and the opaqueness that undermines their trustworthiness in critical applications. Furthermore, we propose an architecture for anomaly detection models that aims to integrate structured knowledge representations to address these limitations. Our hypothesis is that this approach can improve performance and robustness, reduce the required resources (such as data and computation), and provide a higher degree of transparency. As a result, our work contributes to the increased safety of machine learning systems.</p>
<figure><img src="/img/sumnist/sumnist.webp"
    alt="SuMNIST: Can you find the anomaly? State-of-the-Art models fail at this task"><figcaption>
      <p>SuMNIST: Can you find the anomaly? State-of-the-Art models fail at this task</p>
    </figcaption>
</figure>

]]></content:encoded></item><item><title>Mining the Bundestag</title><link>https://www.kkirchheim.de/blog/bundestag-mining/</link><pubDate>Sun, 22 Jan 2023 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/bundestag-mining/</guid><description>&lt;p&gt;&lt;span class="dropcap"&gt;D&lt;/span&gt;&lt;span class="dropcap-rest"&gt;id you know&lt;/span&gt; that the German parliament publishes protocols for all of its proceedings in PDF format?
It is relatively straightforward to &lt;a href="https://git.kondas.de/kkirchheim/bundestag-mining/"&gt;download&lt;/a&gt; and parse them, so we can easily collect a dataset of transcripts of what seems to be every speech in the Bundestag since the Second World War.&lt;/p&gt;
&lt;p&gt;My original idea was to mine the speeches for word associations. Some words will be associated with other words based on the intended connotation, and this association might change over time as the connotations change.
Also, these associations can probably be correlated to individual parties.
Furthermore, this dataset could be used to automatically identify emerging topics.&lt;/p&gt;</description><content:encoded><![CDATA[<p><span class="dropcap">D</span><span class="dropcap-rest">id you know</span> that the German parliament publishes protocols for all of its proceedings in PDF format?
It is relatively straightforward to <a href="https://git.kondas.de/kkirchheim/bundestag-mining/">download</a> and parse them, so we can easily collect a dataset of transcripts of what seems to be every speech in the Bundestag since the Second World War.</p>
<p>My original idea was to mine the speeches for word associations. Some words will be associated with other words based on the intended connotation, and this association might change over time as the connotations change.
Also, these associations can probably be correlated to individual parties.
Furthermore, this dataset could be used to automatically identify emerging topics.</p>
<h2 id="protocols">
  Protocols
  <a href="#protocols">§</a>
</h2>

<h3 id="scraping">
  Scraping
  <a href="#scraping">§</a>
</h3>

<p>Scraping the PDF documents is not difficult, however, extracting the text from the PDFs can be a bit tedious.
In the end, I had to use an OCR solution based on Tesseract that reads the text from some hand-selected locations in the documents.
The results are far from perfect, and some additional pre-processing is required.</p>
<h3 id="language-model">
  Language Model
  <a href="#language-model">§</a>
</h3>

<p>Now that we have a dataset of transcripts from the Bundestag, creating a GPT-2-based language model that generates speeches is not particularly difficult. Below is an example output:</p>
<div class="quote">
    <p>Präsident Dr. Wolfgang Schäuble:</p>
<p>Letzter Redner in dieser Debatte ist der Kollege Michael Brand für die CDU/CSU-Fraktion.</p>
<p>(Beifall bei der CDU/CSU)</p>
<p>Michael Brand (CDU/CSU):</p>
<p>Vielen Dank, Herr Präsident, - Sehr geehrte Damen und Herren Abgeordnete! Die Zustände in den Flüchtlingslagern der Vereinten Nationen sind unerträglich.</p>
<p>Wir müssen jetzt schnell Lösungen finden.</p>
<p>Die Vereinten Nationen, das ist die UN-Vollversammlung, haben die Krise des UN-Flüchtlingsprogramms 2015 beschlossen. Die Krise hat zu massiven Versorgungsengpässen geführt, und das bei so viel Leid und Elend in den Lagern, Die Situation der vielen Flüchtlinge, die vor Krieg, Konflikten und vor allem vor den Toren der Hauptstadt Schutz und Hilfe suchen, ist katastrophal. Der UNHCR, der weltweit die Vergabe von humanitärer Hilfe, auch durch die Vereinten Nationen, durchführt, ist einer der großen Arbeitgeber in diesem Bereich. Es ist ein großer Erfolg, dass der UN-Plan 2015 heute veröffentlicht wurde. Das zeigt: Das UN-Flüchtlingsprogramm ist ein Erfolg - und das wird auch so bleiben. Die Lage der Flüchtlinge in Deutschland hat sich im Vergleich zu den Jahren davor nicht weiter entspannt, im Gegenteil: Die UN-Lager in Deutschland sind überbelegt und überlastet. Das ist ein Problem, dem wir uns auch im nächsten Jahr stellen müssen.</p>
<p>Die Situation in den Flüchtlingslagern ist weiterhin sehr belastet. Es gibt immer noch viele Binnenflüchtlinge. Es gibt immer noch viele Flüchtlinge, die versuchen, in Nachbarländer zu kommen. Die Lage der Flüchtlinge im Libanon ist extrem. Die Lage in Afghanistan ist ebenfalls extrem, Die Situation in Pakistan und in Afghanistan ist ebenfalls extrem, und es gibt weiterhin Flüchtlinge, die aus den Flüchtlingslagern nach Indien oder aus Nordafghanistan in den Norden oder in den Irak kommen.</p>
<p>Im Jahr 2018 - das wurde schon erwähnt - wird es eine große Zahl an Asylbewerbern geben, die zu uns gekommen sind, und wir werden in diesem Jahr insgesamt über 1 Million Flüchtlinge und Einwanderer haben, Diese Menschen brauchen eine sichere, legale, aber faire und Unterstützung, Die Vereinten Nationen müssen jetzt die richtigen Signale aussenden und die richtigen Signale aussenden.</p>
<p>Wir können es uns nicht leisten, diese Menschen alleine zu lassen, Deshalb haben wir heute das Recht, darüber zu entscheiden, ob die Situation weiterhin für die Menschlichkeit haft oder für die Rechtsstaatlichkeit in den Flüchtlingslagern sprechen.</p>
<p>Wir sollten nicht in Abwägung kommen, ob diese Menschen, die wir haben, weiterhin Schutz und humanitäre Hilfe brauchen, und wir müssen ihnen auch weiterhin eine Heimat bieten, in denen sie leben können und in denen die Regeln, die Gesetze und die Regeln der Gesellschaft gelten, Wir dürfen nicht zulassen, dass diese Menschen in den Lagern, von denen wir in den vergangenen Jahren gesprochen haben, wieder in ihre Heimat zurückkehren. Das ist unser gemeinsamer Anspruch, auch im Interesse der Vereinten Nationen.</p>

</div>
<h2 id="video-recordings">
  Video Recordings
  <a href="#video-recordings">§</a>
</h2>

<p>The Bundestag has also been publishing video recordings of all speeches for a couple of years now, so there are huge volumes of high-resolution video and audio data (as well as transcriptions?), recorded in a standardized environment, of every major German politician publicly available on the internet. What could possibly go wrong?</p>
<p>Using the videos is allowed for educational purposes.
The following might give you an idea (educate you) about what might be possible with this data:
Using pre-trained face detectors and Tesseract, we can extract faces, names, and party membership information from the videos.</p>
<figure><img src="/img/bdt/faces-sample.jpg"
    alt="Random sample of crops from the video recordings."><figcaption>
      <p>Random sample of crops from the video recordings.</p>
    </figcaption>
</figure>

<h3 id="autoencoders">
  Autoencoders
  <a href="#autoencoders">§</a>
</h3>

<p>An Autoencoder (AE) is a simple neural network architecture that can be used for dimensionality reduction.
It takes an input image $x$, which is then sent through an encoder $E(x)$, which compresses the input into a lower-dimensional
latent representation $z$. This $z$ is then processed by a decoder $D(z)$, which decompresses $z$ back into $\hat{x}$. During training,
we optimize the encoder and decoder jointly to minimize (in this case) the sum of the squared errors between $x$ and $\hat{x}$.
This way, the AE learns to efficiently reduce the dimensionality of the input, while ensuring that the original input can be reconstructed.</p>
<figure><img src="/img/bdt/ae.svg"
    alt="Architecture of an Autoencoder"><figcaption>
      <p>Architecture of an Autoencoder</p>
    </figcaption>
</figure>

<p>AEs can be used for clustering since similar inputs tend to be close to each other in the latent space.</p>
<figure><img src="/img/bdt/embedding.jpg"
    alt="T-SNE of latent space generated by Deep Convolutional Autoencoder."><figcaption>
      <p>T-SNE of latent space generated by Deep Convolutional Autoencoder.</p>
    </figcaption>
</figure>

<p>We can also use AEs to interpolate between different inputs.
Given two images $x_1$ and $x_n$, we can calculate the point in the latent space for each of them by passing them through the encoder.
We can then interpolate between both points in the latent space and send each of the resulting latent representations $z_1, &hellip;, z_n$ through the decoder to produce a video $\hat{x}_1, &hellip;, \hat{x}_n$ that shows a smooth transition between the original images.
Below is a video that interpolates between images of three different politicians.</p>
<figure><img src="/img/bdt/interpolate.gif"
    alt="Interpolating between points in the latent space." width="256px"><figcaption>
      <p>Interpolating between points in the latent space.</p>
    </figcaption>
</figure>

<p>There are different variations of AEs, many of which impose additional constraints on the latent space to induce a specific structure, such as variational AEs. Interestingly, in this case, the vanilla AE seems to be sufficient to learn a &ldquo;smooth&rdquo; latent that can be used for interpolation.
</p>
]]></content:encoded></item><item><title>Mining tagesschau.de</title><link>https://www.kkirchheim.de/blog/tagesschau/</link><pubDate>Sat, 26 Nov 2022 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/tagesschau/</guid><description>&lt;p&gt;I like to read &lt;a href="https://tagesschau.de"&gt;tagesschau.de&lt;/a&gt;, so I wrote a &lt;a href="https://git.kondas.de/kkirchheim/tagesschau-mining/"&gt;script&lt;/a&gt; to scrape it in regular intervals.&lt;/p&gt;
&lt;p&gt;My original goal was to determine which articles stay on the front page the longest, which ones allow commenting (a feature that seems to have been disabled almost entirely since March 2020), and if articles are modified after the initial release (without mentioning this), because I sometimes feel that headlines change.&lt;/p&gt;
&lt;h2 id="dataset-creation"&gt;
Dataset Creation
&lt;a href="#dataset-creation"&gt;§&lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Tagesschau provides a JSON API, so fetching all of the articles is relatively straightforward and can be done with just a few lines of code.&lt;/p&gt;</description><content:encoded><![CDATA[<p>I like to read <a href="https://tagesschau.de">tagesschau.de</a>, so I wrote a <a href="https://git.kondas.de/kkirchheim/tagesschau-mining/">script</a> to scrape it in regular intervals.</p>
<p>My original goal was to determine which articles stay on the front page the longest, which ones allow commenting (a feature that seems to have been disabled almost entirely since March 2020), and if articles are modified after the initial release (without mentioning this), because I sometimes feel that headlines change.</p>
<h2 id="dataset-creation">
  Dataset Creation
  <a href="#dataset-creation">§</a>
</h2>

<p>Tagesschau provides a JSON API, so fetching all of the articles is relatively straightforward and can be done with just a few lines of code.</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>now <span style="color:#81a1c1">=</span> datetime<span style="color:#81a1c1">.</span>now<span style="color:#eceff4">()</span>
</span></span><span style="display:flex;"><span>date_time <span style="color:#81a1c1">=</span> now<span style="color:#81a1c1">.</span>strftime<span style="color:#eceff4">(</span><span style="color:#a3be8c">&#34;%Y-%m-</span><span style="color:#a3be8c">%d</span><span style="color:#a3be8c">_%H_%M_%S&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>url <span style="color:#81a1c1">=</span> <span style="color:#a3be8c">f</span><span style="color:#a3be8c">&#34;https://www.tagesschau.de/api2/&#34;</span>
</span></span><span style="display:flex;"><span>r <span style="color:#81a1c1">=</span> requests<span style="color:#81a1c1">.</span>get<span style="color:#eceff4">(</span>url<span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>path <span style="color:#81a1c1">=</span> join<span style="color:#eceff4">(</span>root<span style="color:#eceff4">,</span> <span style="color:#a3be8c">f</span><span style="color:#a3be8c">&#34;</span><span style="color:#a3be8c">{</span>date_time<span style="color:#a3be8c">}</span><span style="color:#a3be8c">.json&#34;</span><span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">if</span> r<span style="color:#81a1c1">.</span>status_code <span style="color:#81a1c1">==</span> <span style="color:#b48ead">200</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span> data <span style="color:#81a1c1">=</span> r<span style="color:#81a1c1">.</span>content
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span> <span style="color:#81a1c1;font-weight:bold">with</span> <span style="color:#81a1c1">open</span><span style="color:#eceff4">(</span>path<span style="color:#eceff4">,</span> <span style="color:#a3be8c">&#34;w&#34;</span><span style="color:#eceff4">)</span> <span style="color:#81a1c1;font-weight:bold">as</span> f<span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span> f<span style="color:#81a1c1">.</span>write<span style="color:#eceff4">(</span>data<span style="color:#81a1c1">.</span>decode<span style="color:#eceff4">())</span></span></span></code></pre></div>
<p>I automatically ran this script once per hour for more than two years, which gave me $\approx$ 15,000 unique news articles.</p>
<h2 id="exploratory-data-analysis">
  Exploratory Data Analysis
  <a href="#exploratory-data-analysis">§</a>
</h2>

<p>Now that we have a dataset, we can do some exploratory data analysis. For example, we can investigate when articles are published.
Let&rsquo;s plot the number of articles per weekday:</p>
<figure><img src="/img/tgs/days-of-week.png">
</figure>

<p>More articles are published on Wednesday and Friday, while, during the weekend, the
least articles are published. This sounds reasonable: fewer people work on the weekend, so there are fewer articles.
But what is the reason for the spike on Fridays?
Since the articles contain the exact publication date, we can plot the distribution of articles for each day, over each hour.
The plot looks like this:</p>
<figure><img src="/img/tgs/dist-over-days.png">
</figure>

<p>Here, we notice something interesting: Quite a lot of articles are published on Friday around 17:00 and 20:00.
My hypothesis is that these are articles that the editorial staff pushed out so that people have stuff to read during the weekend.</p>
<p>Let&rsquo;s have a look at the length of the articles:</p>
<figure><img src="/img/tgs/words-per-article.png">
</figure>

<p>If the hypothesis is true, we could expect that the articles published on a Friday evening are longer than average.
The histogram of the number of articles, plotted against the hour and the number of words in the articles looks like this:</p>
<figure><img src="/img/tgs/words-over-hours-friday.png"
    alt="Length of articles released on Fridays, over time."><figcaption>
      <p>Length of articles released on Fridays, over time.</p>
    </figcaption>
</figure>

<p>This seems to support the hypothesis: Friday evening after the tagesschau has aired, an unusual amount of lengthy articles is published.
This does not seem too far-fetched.</p>
<h2 id="masked-language-modeling">
  Masked Language Modeling
  <a href="#masked-language-modeling">§</a>
</h2>

<p>Masked language modeling can be seen as a special kind of classification task.
Given the previous and the next word, what is the probability of the masked word?</p>
<p>Consider the sentence:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-fallback" data-lang="fallback"><span style="display:flex;"><span>The [mask] jumps over the lazy dog.  
</span></span></code></pre></div><p>Here, we are trying to find the most probable word for <code>[mask]</code>. We can then do this for every word in some dataset, and multiply the results, or, in mathematical terms
$$ p(\mathcal{D} \vert \theta) = \prod_{x \in \mathcal{D}} p(x_i | x_{j \neq i}, \theta) $$
where $\mathcal{D}$ is a dataset with a set of documents $x$,and $\theta$ are the parameters of our model.
In practice, instead of maximizing this probability during training, we will minimize its negative logarithm, which will turn the product into a sum.
This also has the benefit of being more stable, numerically.</p>
<figure><img src="/img/tgs/masked-lang-model.svg"
    alt="Language modeling by recovering masked inputs."><figcaption>
      <p>Language modeling by recovering masked inputs.</p>
    </figcaption>
</figure>

<h3 id="clustering">
  Clustering
  <a href="#clustering">§</a>
</h3>

<p>We can use a model trained for masked language modeling for clustering.
Below, you can find a clustering of the articles based on their content.
Articles are vectorized by a German version of BERT, the visualization uses PCA and T-SNE.
The color represents the category to which the articles were assigned.
Using the categories as a sanity check, the clustering seems to work reasonably well.
In fact, we can even find some articles that apparently have been categorized wrong.</p>
<figure><img src="/img/tgs/bokeh.jpg"
    alt="Article Clustering based on BERT" width="85%"><figcaption>
      <p>Article Clustering based on BERT</p>
    </figcaption>
</figure>

<h2 id="generative-language-modeling">
  Generative Language Modeling
  <a href="#generative-language-modeling">§</a>
</h2>

<p>We can use this dataset to create a fake news generator.</p>
<figure><img src="/img/tgs/gpt.svg"
    alt="GPT General architecture"><figcaption>
      <p>GPT General architecture</p>
    </figcaption>
</figure>

<h3 id="gptagesschau">
  GPTagesschau
  <a href="#gptagesschau">§</a>
</h3>

<p>I fine-tuned a German GPT-2-based language model on the dataset to generate news articles in the style of tagesschau.de.
The model is not that good yet, which is probably due to the fact that</p>
<ul>
<li>German pre-trained language models are not as good as their English counterparts and</li>
<li>the dataset is too small (15k unique articles at the time of writing).</li>
</ul>
<p>Still, the generated (fake) news articles are somewhat coherent, even if they tend to contradict themselves.
The model is also able to generate titles and headlines.</p>
<p>In the following, I give a rather funny example. Note that I did not select this article for its realism,
but because I thought the apparent mixture of two topics makes it an interesting read.</p>
<div class="quote">
    <h3>Inhaftierter Wikileaks-Gründer: Erste Anklage gegen Assange?</h3>
<p><strong>Die US-Justiz hat in London den Prozess gegen Julian Assange eingeleitet. Ihm wird vorgeworfen, im Irak Anhänger einer Terrormiliz bekämpft zu haben. Assanges Unterstützer sollen im Krieg gegen den Irak an Waffen und Ausrüstung gekommen sein.</strong></p>
<p>Ein britisches Gericht hat den Rechtsstreit um die Auslieferung des Enthüllers der Enthüllungsplattform Wikileaks, Julian Assange, in Großbritannien begonnen. Für den Gründer der Demokratie-Bewegung Assange bestehe die Chance, die Untersuchungshaft in London absitzen zu können, teilte die Londoner Generalstaatsanwältin Letitia James mit. &ldquo;Assange kann hoffen, eine Chance zu haben, in einem fairen Prozess von allen Seiten an das Richtige erinnert zu werden.&rdquo; Assange drohen bei einer Verurteilung bis zu 175 Jahre Haft. Die Staatsanwaltschaft wirft ihm vor, Anhänger der Terrormiliz &ldquo;Islamischer Staat&rdquo; (IS) und des IS-Regimes mit Waffen und Ausrüstung versorgt zu haben.</p>
<h4>Assange: Ein Anhänger von Al-Kaida und Al-Nur?</h4>
Seine Verteidiger hingegen sagten, das Gericht sei der Meinung, dass Assange zu den Aktivitäten von Al-Kaida oder der Terrororganisation Islamischer Staat (IS) gehöre. Es gebe keine Beweise dafür, dass er IS-Mitglieder angeworben habe. Dem Gericht zufolge wird Assange vorgeworfen, er habe falsche Angaben gemacht, um Kämpfer der IS-Miliz und IS-Anführer zu unterstützen. Der Anwalt von Assange, Michel Barnier, nannte die Anklageerhebung einen "Meilenstein" für ihn. "Der Rechtsstaat gibt Julian Assange das Recht, sich frei zu bewegen", sagte Barnier im Sender Euronews. 
<h4>Assange drohen bis zu 175 Jahre Haft</h4>
Ein Prozess gegen Assange wäre der erste, in dem ein Gericht ein Urteil fällte. Der Gründer der ältesten und wichtigste Nachrichtenplattform der Welt sitzt in der ecuadorianischen Metropole Quito in Haft, seit er 2007 festgenommen und im September vergangenen Jahres in die USA gebracht worden war. Es wäre die erste Anklage gegen Assange, die ein Gericht in Großbritannien erhebt. Der 37-Jährige ist der größte investigative Journalist, der je inhaftiert wurde.

</div>
]]></content:encoded></item><item><title>On Outlier Exposure with Generative Models</title><link>https://www.kkirchheim.de/papers/goe/</link><pubDate>Wed, 23 Nov 2022 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/goe/</guid><description>&lt;p&gt;Our paper &lt;strong&gt;On Outlier Exposure with Generative Models&lt;/strong&gt; has been accepted on the NeurIPS Machine Learning Safety Workshop.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;While Outlier Exposure reliably increases the performance of Out-of-Distribution detectors, it requires a set of available outliers during training. In this paper, we propose Generative Outlier Exposure (GOE), which alleviates the need for available outliers by using generative models to sample synthetic outliers from low-density regions of the data distribution. The approach requires no modification of the generator, works on image and text data, and can be used with pre-trained models.
We demonstrate the effectiveness of generated outliers on several image and text datasets, including ImageNet.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper <strong>On Outlier Exposure with Generative Models</strong> has been accepted on the NeurIPS Machine Learning Safety Workshop.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>While Outlier Exposure reliably increases the performance of Out-of-Distribution detectors, it requires a set of available outliers during training. In this paper, we propose Generative Outlier Exposure (GOE), which alleviates the need for available outliers by using generative models to sample synthetic outliers from low-density regions of the data distribution. The approach requires no modification of the generator, works on image and text data, and can be used with pre-trained models.
We demonstrate the effectiveness of generated outliers on several image and text datasets, including ImageNet.</p>
<figure><img src="/img/goe/generated-outliers.webp"
    alt="Outliers generated by BigGAN trained on different datasets"><figcaption>
      <p>Outliers generated by BigGAN trained on different datasets</p>
    </figcaption>
</figure>

]]></content:encoded></item><item><title>Social Work Research Map</title><link>https://www.kkirchheim.de/papers/sworm-german/</link><pubDate>Fri, 11 Nov 2022 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/sworm-german/</guid><description>&lt;p&gt;During the last weeks, I worked with some colleagues on a &lt;a href="https://www.sworm.org"&gt;website&lt;/a&gt; that aims to improve access to social work literature.
We described the results in out paper &lt;strong&gt;Social Work Research Map – ein niederschwelliger Zugang zu internationalen Publikationen der Sozialen Arbeit&lt;/strong&gt;, which has been published in the journal &lt;em&gt;Soziale Passagen&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;While the paper is written in german, there is also a &lt;a href="https://www.kkirchheim.de/pdf/sworm-technical-report.pdf"&gt;technical report&lt;/a&gt; in english.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Internationalization is a central topic in higher education policy in Germany. An orientation towards international discourses is also required in the teaching, research and practice of social work. Due to rapidly growing research results, obtaining a systematic overview of disciplinary knowledge is becoming increasingly difficult. This paper describes the development of an interactive website called Social Work Research Map, which should facilitate access to scientific publications in social work. For this purpose, a database with almost 25,000 journal articles from 23 social work journals was created. With the help of automated text analysis (topic modeling), the abstracts were examined and structured into 40 thematic clusters. Different visualization techniques and filter functions enable users to search the database independently according to their corresponding interests. Individual search results can be saved, and an artificial-intelligence-based recommendation system suggests similar publications. The development of SWORM is an example of the use of computer science methods in social work and illustrates the potential of structuring large amounts of text and making it accessible to people. At the same time, it becomes clear that the application of such methods is challenging for social scientists and that the use of AI raises ethical problems.&lt;/p&gt;</description><content:encoded><![CDATA[<p>During the last weeks, I worked with some colleagues on a <a href="https://www.sworm.org">website</a> that aims to improve access to social work literature.
We described the results in out paper <strong>Social Work Research Map – ein niederschwelliger Zugang zu internationalen Publikationen der Sozialen Arbeit</strong>, which has been published in the journal <em>Soziale Passagen</em>.</p>
<p>While the paper is written in german, there is also a <a href="/pdf/sworm-technical-report.pdf">technical report</a> in english.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Internationalization is a central topic in higher education policy in Germany. An orientation towards international discourses is also required in the teaching, research and practice of social work. Due to rapidly growing research results, obtaining a systematic overview of disciplinary knowledge is becoming increasingly difficult. This paper describes the development of an interactive website called Social Work Research Map, which should facilitate access to scientific publications in social work. For this purpose, a database with almost 25,000 journal articles from 23 social work journals was created. With the help of automated text analysis (topic modeling), the abstracts were examined and structured into 40 thematic clusters. Different visualization techniques and filter functions enable users to search the database independently according to their corresponding interests. Individual search results can be saved, and an artificial-intelligence-based recommendation system suggests similar publications. The development of SWORM is an example of the use of computer science methods in social work and illustrates the potential of structuring large amounts of text and making it accessible to people. At the same time, it becomes clear that the application of such methods is challenging for social scientists and that the use of AI raises ethical problems.</p>

<video class="video-shortcode" preload="auto" controls>
    <source src="https://sworm.org/static/video/sworm-intro.mp4" type="video/mp4">
    There should have been a video here but your browser does not seem
    to support it.
</video>

]]></content:encoded></item><item><title>Towards Transformer-based Homogenization of Satellite Imagery for Landsat-8 and Sentinel-2</title><link>https://www.kkirchheim.de/papers/satelite-harmonization-workshop/</link><pubDate>Sat, 13 Aug 2022 01:42:22 +0200</pubDate><guid>https://www.kkirchheim.de/papers/satelite-harmonization-workshop/</guid><description>&lt;p&gt;Our abstract &lt;strong&gt;Towards Transformer-based Homogenization of Satellite Imagery for Landsat-8 and Sentinel-2&lt;/strong&gt; was accepted for presentation on the &lt;a href="https://sites.google.com/view/esstransformers/"&gt;Transformers Workshop for Environmental Science&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In summary, we somewhat surprisingly found that transformers, a neural network architecture that achieves state-of-the-art results on most tasks it is applied to, does not outperform a vanilla U-Net model on our particular superresolution task.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our abstract <strong>Towards Transformer-based Homogenization of Satellite Imagery for Landsat-8 and Sentinel-2</strong> was accepted for presentation on the <a href="https://sites.google.com/view/esstransformers/">Transformers Workshop for Environmental Science</a>.</p>
<p>In summary, we somewhat surprisingly found that transformers, a neural network architecture that achieves state-of-the-art results on most tasks it is applied to, does not outperform a vanilla U-Net model on our particular superresolution task.</p>
]]></content:encoded></item><item><title>Convolutional Filter Visualization</title><link>https://www.kkirchheim.de/blog/filter-visualization/</link><pubDate>Wed, 27 Jul 2022 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/blog/filter-visualization/</guid><description>&lt;p&gt;Deep Neural Networks are black-boxes: they map some input to some output, and we can make them do this surprisingly well.
However, we usually have no idea how this mapping works.
Particularly &lt;a href="https://en.wikipedia.org/wiki/Convolutional_neural_network"&gt;Convolutional Neural Networks&lt;/a&gt; (CNNs), which employ &amp;ldquo;convolutions&amp;rdquo; as filters, achieved some impressive results (before Vision &lt;a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)"&gt;Transformers&lt;/a&gt; came along).&lt;/p&gt;
&lt;p&gt;Filter Visualization can help us understand what kind of patterns the convolutional filters in CNNs detect.&lt;/p&gt;
&lt;h2 id="why-would-we-want-to-do-it"&gt;
Why would we want to do it?
&lt;a href="#why-would-we-want-to-do-it"&gt;§&lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;Visualizing filters can help us to get an understanding of what the neural network is doing.
The method can also be used to identify filters that are not required for the model, because they are redundant copies of other filters, or compute not valuable features at all.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Deep Neural Networks are black-boxes: they map some input to some output, and we can make them do this surprisingly well.
However, we usually have no idea how this mapping works.
Particularly <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">Convolutional Neural Networks</a> (CNNs), which employ &ldquo;convolutions&rdquo; as filters, achieved some impressive results (before Vision <a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)">Transformers</a> came along).</p>
<p>Filter Visualization can help us understand what kind of patterns the convolutional filters in CNNs detect.</p>
<h2 id="why-would-we-want-to-do-it">
  Why would we want to do it?
  <a href="#why-would-we-want-to-do-it">§</a>
</h2>

<p>Visualizing filters can help us to get an understanding of what the neural network is doing.
The method can also be used to identify filters that are not required for the model, because they are redundant copies of other filters, or compute not valuable features at all.</p>
<h2 id="how-does-it-work">
  How does it work?
  <a href="#how-does-it-work">§</a>
</h2>

<p>Filter visualization aims to find the input $x$ that activates a certain convolutional filter the most.
Mathematically, this means we are solving
$$ \arg \max_x \mathcal{L} (f(x)) = \sqrt{\sum_i \sum_j f(x)_{ij}^2} $$</p>
<p>where $f(x)_{ij}$ refers to the value at position $i,j$ in the feature map (the output of the filter) computed by $f$.</p>
<p>In practice, we solve this optimization problem via gradient descent (or, in this case, ascend, since we aim to maximize the activation). That is: we start with a randomly initialized input, $x$, calculate the gradient of the magnitude of the filter activation $\nabla_x \mathcal{L}$, and iteratively update $x$ to increase the magnitude:</p>
<p>$$ x&rsquo; = x + \alpha  \nabla_x \mathcal{L}(f(x)) . $$</p>
<p>Additionally, we normalize the gradient during updates for stability.</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>layer <span style="color:#81a1c1">=</span> net<span style="color:#81a1c1">.</span>conv1 <span style="color:#616e87;font-style:italic"># this is the layer we are targeting </span>
</span></span><span style="display:flex;"><span>filter_no <span style="color:#81a1c1">=</span> <span style="color:#b48ead">5</span> <span style="color:#616e87;font-style:italic"># the index of the filter we are targeting </span>
</span></span><span style="display:flex;"><span>alpha <span style="color:#81a1c1">=</span>  <span style="color:#b48ead">0.01</span>  <span style="color:#616e87;font-style:italic"># learning rate </span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>x <span style="color:#81a1c1">=</span> torch<span style="color:#81a1c1">.</span>randn<span style="color:#eceff4">(</span>size<span style="color:#81a1c1">=</span><span style="color:#eceff4">(</span><span style="color:#b48ead">3</span><span style="color:#eceff4">,</span><span style="color:#b48ead">256</span><span style="color:#eceff4">,</span><span style="color:#b48ead">256</span><span style="color:#eceff4">))</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>x_v <span style="color:#81a1c1">=</span> Variable<span style="color:#eceff4">(</span>x<span style="color:#81a1c1">.</span>unsqueeze<span style="color:#eceff4">(</span><span style="color:#b48ead">0</span><span style="color:#eceff4">))</span><span style="color:#81a1c1">.</span>cuda<span style="color:#eceff4">(),</span> <span style="color:#616e87;font-style:italic"># size will be BxCxHxW </span>
</span></span><span style="display:flex;"><span>x_v<span style="color:#81a1c1">.</span>requires_grad <span style="color:#81a1c1">=</span> <span style="color:#81a1c1;font-weight:bold">True</span> <span style="color:#616e87;font-style:italic"># enable grad to include in backprop </span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#81a1c1;font-weight:bold">for</span> i <span style="color:#81a1c1;font-weight:bold">in</span> <span style="color:#81a1c1">range</span><span style="color:#eceff4">(</span><span style="color:#b48ead">50</span><span style="color:#eceff4">):</span>
</span></span><span style="display:flex;"><span>   <span style="color:#616e87;font-style:italic"># gradient ascend iteration</span>
</span></span><span style="display:flex;"><span>   out <span style="color:#81a1c1">=</span> layer<span style="color:#eceff4">(</span>x_v<span style="color:#eceff4">)</span>  <span style="color:#616e87;font-style:italic"># size will be BxCxHxW</span>
</span></span><span style="display:flex;"><span>   f <span style="color:#81a1c1">=</span> out<span style="color:#eceff4">[</span><span style="color:#b48ead">0</span><span style="color:#eceff4">,</span> filter_no<span style="color:#eceff4">]</span> <span style="color:#616e87;font-style:italic"># select filter by index, size will be HxW</span>
</span></span><span style="display:flex;"><span>   
</span></span><span style="display:flex;"><span>   loss <span style="color:#81a1c1">=</span> f<span style="color:#81a1c1">.</span>pow<span style="color:#eceff4">(</span><span style="color:#b48ead">2</span><span style="color:#eceff4">)</span><span style="color:#81a1c1">.</span>sum<span style="color:#eceff4">()</span><span style="color:#81a1c1">.</span>sqrt<span style="color:#eceff4">()</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#616e87;font-style:italic"># zero grads</span>
</span></span><span style="display:flex;"><span>    <span style="color:#81a1c1;font-weight:bold">if</span> x_v<span style="color:#81a1c1">.</span>grad <span style="color:#81a1c1;font-weight:bold">is</span> <span style="color:#81a1c1;font-weight:bold">not</span> <span style="color:#81a1c1;font-weight:bold">None</span><span style="color:#eceff4">:</span>
</span></span><span style="display:flex;"><span>      x_v<span style="color:#81a1c1">.</span>grad<span style="color:#81a1c1">.</span>data <span style="color:#81a1c1">=</span> torch<span style="color:#81a1c1">.</span>zeros_like<span style="color:#eceff4">(</span>x_v<span style="color:#81a1c1">.</span>grad<span style="color:#81a1c1">.</span>data<span style="color:#eceff4">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>   loss<span style="color:#81a1c1">.</span>backward<span style="color:#eceff4">()</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>   <span style="color:#81a1c1;font-weight:bold">with</span> torch<span style="color:#81a1c1">.</span>no_grad<span style="color:#eceff4">():</span>
</span></span><span style="display:flex;"><span>      <span style="color:#616e87;font-style:italic"># gradient normalization and upate </span>
</span></span><span style="display:flex;"><span>      x_v<span style="color:#81a1c1">.</span>grad <span style="color:#81a1c1">/=</span> x_v<span style="color:#81a1c1">.</span>grad<span style="color:#81a1c1">.</span>pow<span style="color:#eceff4">(</span><span style="color:#b48ead">2</span><span style="color:#eceff4">)</span><span style="color:#81a1c1">.</span>mean<span style="color:#eceff4">()</span><span style="color:#81a1c1">.</span>sqrt<span style="color:#eceff4">()</span> <span style="color:#81a1c1">+</span> <span style="color:#b48ead">0.000001</span>
</span></span><span style="display:flex;"><span>      x_v <span style="color:#81a1c1">+=</span> x_v<span style="color:#81a1c1">.</span>grad <span style="color:#81a1c1">*</span> alpha
</span></span></code></pre></div><h2 id="results">
  Results
  <a href="#results">§</a>
</h2>

<p>What we observe is that, the deeper we go, the more abstract the features become.
While the lower features - lines with different orientations, certain colors, and color blobs - are comparatively straightforward to intepret,
guessing the meaning of the lower-level features feels more like a <a href="https://en.wikipedia.org/wiki/Rorschach_test">Rorschach test</a>.</p>
<figure><img src="/img/filter-viz/conv-1.webp"
    alt="Filter visualization of the first convolutional layer of a ResNet 101" width="100%"><figcaption>
      <p>Filter visualization of the first convolutional layer of a ResNet 101</p>
    </figcaption>
</figure>

<p>These observations are evidence for the hypothesis that neural networks learn increasingly abstract, high-level features in upper layers.
On the other hand, this also means that we can not really get an understanding of what these lower layers are doing.</p>
<figure><img src="/img/filter-viz/layer4.webp"
    alt="Filter visualization of the last layer of a ResNet 101" width="100%"><figcaption>
      <p>Filter visualization of the last layer of a ResNet 101</p>
    </figcaption>
</figure>

]]></content:encoded></item><item><title>Multi-Class Hypersphere Anomaly Detection (MCHAD)</title><link>https://www.kkirchheim.de/papers/mchad/</link><pubDate>Wed, 13 Jul 2022 21:58:50 +0200</pubDate><guid>https://www.kkirchheim.de/papers/mchad/</guid><description>&lt;p&gt;Our Paper &lt;strong&gt;Multi-Class Hypersphere Anomaly Detection&lt;/strong&gt; (MCHAD) has been accepted for presentation at the ICPR 2022.
In summary, we propose a new loss function for learning neural networks that are able to detect anomalies in their inputs.&lt;/p&gt;
&lt;figure class="figure-right"&gt;&lt;a href="https://www.kkirchheim.de/pdf/mchad-poster.pdf"&gt;&lt;img src="https://www.kkirchheim.de/img/thumbs/mchad-poster.jpg"
alt="Poster for MCHAD (PDF)." width="250px"&gt;&lt;/a&gt;&lt;figcaption&gt;
&lt;p&gt;Poster for MCHAD (PDF).&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;MACHAD is available via &lt;a href="https://www.kkirchheim.de/papers/pytorch-ood"&gt;pytorch-ood&lt;/a&gt;. You can find example code &lt;a href="https://pytorch-ood.readthedocs.io/en/latest/auto_examples/loss/supervised/mchad.html#sphx-glr-auto-examples-loss-supervised-mchad-py"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="how-does-it-work"&gt;
How does it work?
&lt;a href="#how-does-it-work"&gt;§&lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;The general idea is that we want a neural network $f_{\theta}: \mathcal{X} \rightarrow \mathcal{Z}$ that maps inputs from the input space to some lower dimensional representation in such a way that points from class $y$ cluster around a hypersphere with center $\mu_y$ in the output space.
Because the neural network can learn non-linear functions, the classes in the input space can have arbitrarily complex shapes.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our Paper <strong>Multi-Class Hypersphere Anomaly Detection</strong> (MCHAD) has been accepted for presentation at the ICPR 2022.
In summary, we propose a new loss function for learning neural networks that are able to detect anomalies in their inputs.</p>
<figure class="figure-right"><a href="/pdf/mchad-poster.pdf"><img src="/img/thumbs/mchad-poster.jpg"
    alt="Poster for MCHAD (PDF)." width="250px"></a><figcaption>
      <p>Poster for MCHAD (PDF).</p>
    </figcaption>
</figure>

<p>MACHAD is available via <a href="/papers/pytorch-ood">pytorch-ood</a>. You can find example code <a href="https://pytorch-ood.readthedocs.io/en/latest/auto_examples/loss/supervised/mchad.html#sphx-glr-auto-examples-loss-supervised-mchad-py">here</a>.</p>
<h2 id="how-does-it-work">
  How does it work?
  <a href="#how-does-it-work">§</a>
</h2>

<p>The general idea is that we want a neural network $f_{\theta}: \mathcal{X} \rightarrow \mathcal{Z}$ that maps inputs from the input space to some lower dimensional representation in such a way that points from class $y$ cluster around a hypersphere with center $\mu_y$ in the output space.
Because the neural network can learn non-linear functions, the classes in the input space can have arbitrarily complex shapes.</p>
<p>To train this neural network, we optimize its parameters $\theta$ to minimize a loss function.
We then hope that the model only maps points from the known classes into the spheres of the corresponding spheres, while other points that are dissimilar to the training data (i.e., anomalies) are mapped further away because the model never learned to map these points close to the centers.</p>
<p>Omitting some details, the loss function we propose has three different components, each of which we will explain in the following.</p>
<figure><img src="/img/mchad/arch.webp">
</figure>

<h3 id="intra-class-variance">
  Intra-Class Variance
  <a href="#intra-class-variance">§</a>
</h3>

<p>We want the representations $f_{\theta}(x)$ of one class to cluster as tightly around a class center $\mu_y$ as possible.
For this, we can use the <em>Intra class variance loss</em>, which is defined as:</p>
<p>$$  \mathcal{L}_{\Lambda}(x,y) = \Vert \mu_y - f_{\theta}(x) \Vert^2 $$</p>
<h3 id="inter-class-variance">
  Inter-Class Variance
  <a href="#inter-class-variance">§</a>
</h3>

<p>A trivial solution to minimize $ \mathcal{L}_{\Lambda}$ would be to map all inputs to the same point, which would lead to the collapse of the model.
To prevent this, we have to add a second term that ensures that the points remain separable.
Let $d_j = \lVert \mu_j - f_{\theta}(x) \rVert^2$. Then we define the inter-class variance loss term as</p>
<p>$$
\begin{align*}
\mathcal L_\Delta(x,y)
&amp;= \log \left( 1 + \sum_{j \ne y} e^{d_y - d_j} \right) \\
&amp;= \log \left( \frac{e^{d_y}}{e^{d_y}} + \sum_{j \ne y} \frac{e^{d_y}}{e^{d_j}} \right) \\
&amp;= \log \left( \sum_{j} \frac{e^{d_y}}{e^{d_j}} \right) \\
&amp;= \log \left( e^{d_y} \sum_{j} e^{-d_j} \right) \\
&amp;= d_y + \log \left( \sum_{j} e^{-d_j} \right) \\
&amp;= -\log \left( \frac{e^{-d_y}}{\sum_j e^{-d_j}} \right) \\
&amp;= -\log \operatorname{softmax}_y(-d) .
\end{align*}
$$</p>
<p>We can see that, in this loss term, the negative squared distances to the centers, i.e. $-d_j$, take the role of the logits in a standard softmax classifer.</p>
<figure><img src="/img/mchad/mchad.webp"
    alt="MCHAD on CIFAR 10 with $\mathcal{Z} = \mathbb{R}^2$" width="500pt"><figcaption>
      <p>MCHAD on CIFAR 10 with $\mathcal{Z} = \mathbb{R}^2$</p>
    </figcaption>
</figure>

<h3 id="extra-class-variance">
  Extra-Class Variance
  <a href="#extra-class-variance">§</a>
</h3>

<p>Sometimes, we have a set of example outliers at hand.
Previous work showed that the robustness of models can be significantly improved by including these in the optimization.
Therefore, we can add a term that incentivizes such outliers to be mapped sufficiently far away from the class centers:</p>
<p>$$ \mathcal{L}_{\Theta}(x) = \max \lbrace 0, r_y^2 -  \Vert \mu_y - f_{\theta}(x) \Vert^2 \rbrace $$</p>
<p>where $x$ is some outlier and $r_y$ is some class conditional radius.
This term can also be applied to other methods that aim to learn spherical clusters in their output space.
We refer to it as Generalized MCHAD.</p>
<figure><img src="/img/mchad/gmchad.webp"
    alt="Generalized MCHAD on CIFAR 10  with $\mathcal{Z} = \mathbb{R}^2$" width="500pt"><figcaption>
      <p>Generalized MCHAD on CIFAR 10  with $\mathcal{Z} = \mathbb{R}^2$</p>
    </figcaption>
</figure>

<h2 id="how-well-does-it-work">
  How well does it work?
  <a href="#how-well-does-it-work">§</a>
</h2>

<p>Our experiments found that both MCHAD and Generalized MCHAD outperform other hypersphere learning methods.
In ablations studies, we also investigated the influence of each of the loss terms and demonstrated that all of them contribute to the overall performance regarding discriminative power on normal data and the ability to detect anomalies.</p>
]]></content:encoded></item><item><title>On Challenging Aspects of Reproducibility in Deep Anomaly Detection</title><link>https://www.kkirchheim.de/papers/mchad-reproducibility/</link><pubDate>Wed, 13 Jul 2022 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/mchad-reproducibility/</guid><description>&lt;p&gt;Our companion paper, &lt;strong&gt;On Challenging Aspects of Reproducibility in Deep Anomaly Detection&lt;/strong&gt;, has been accepted for presentation at the Fourth Workshop on Reproducible Research in Pattern Recognition (satellite event of ICPR 2022).&lt;/p&gt;
&lt;p&gt;In it, we discuss aspects of reproducibility for our anomaly detection algorithm &lt;a href="https://www.kkirchheim.de/papers/mchad"&gt;MCHAD&lt;/a&gt;, as well as anomaly detection with deep neural networks in general.
In particular, we discussed the following challenges for the reproducibility:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nondeterminism: conducting the same experiment with different random seeds might lead to significantly different outcomes.&lt;/li&gt;
&lt;li&gt;Sensitivity to hyper-parameters: slight changes in hyper-parameters can drastically alter the outcomes.&lt;/li&gt;
&lt;li&gt;Complexity: the more complex an algorithm, the more likely an implementation contains errors.&lt;/li&gt;
&lt;li&gt;Dataset Selection: The performance of a method is going to depend on the dataset on which you evaluate it.&lt;/li&gt;
&lt;li&gt;Resource Limitations: resource requirements can limit the number of individuals or institutions that are able to reproduce the training.&lt;/li&gt;
&lt;li&gt;Dependencies: dependencies, in the form of data, pre-trained weights, or software libraries, might get taken down at some point.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The large number of dependencies in our experiments may harm the reproducibility of our exact numerical results.
However, we argue that the reproducibility of conclusions should be prioritized over the reproducibility of exact numerical results since the former contributes to the advancement of scientific knowledge.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our companion paper, <strong>On Challenging Aspects of Reproducibility in Deep Anomaly Detection</strong>, has been accepted for presentation at the Fourth Workshop on Reproducible Research in Pattern Recognition (satellite event of ICPR 2022).</p>
<p>In it, we discuss aspects of reproducibility for our anomaly detection algorithm <a href="/papers/mchad">MCHAD</a>, as well as anomaly detection with deep neural networks in general.
In particular, we discussed the following challenges for the reproducibility:</p>
<ul>
<li>Nondeterminism: conducting the same experiment with different random seeds might lead to significantly different outcomes.</li>
<li>Sensitivity to hyper-parameters: slight changes in hyper-parameters can drastically alter the outcomes.</li>
<li>Complexity: the more complex an algorithm, the more likely an implementation contains errors.</li>
<li>Dataset Selection: The performance of a method is going to depend on the dataset on which you evaluate it.</li>
<li>Resource Limitations: resource requirements can limit the number of individuals or institutions that are able to reproduce the training.</li>
<li>Dependencies: dependencies, in the form of data, pre-trained weights, or software libraries, might get taken down at some point.</li>
</ul>
<p>The large number of dependencies in our experiments may harm the reproducibility of our exact numerical results.
However, we argue that the reproducibility of conclusions should be prioritized over the reproducibility of exact numerical results since the former contributes to the advancement of scientific knowledge.</p>
]]></content:encoded></item><item><title>PyTorch-OOD: A library for Out-of-Distribution Detection based on PyTorch</title><link>https://www.kkirchheim.de/papers/pytorch-ood/</link><pubDate>Wed, 13 Jul 2022 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/pytorch-ood/</guid><description>&lt;p&gt;Our paper, &lt;strong&gt;PyTorch-OOD: A library for Out-of-Distribution Detection based on PyTorch&lt;/strong&gt;, has been presented at the CVPR 2022 Workshops.
You can find the most recent version of the Python source code on &lt;a href="https://github.com/kkirchheim/pytorch-ood"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="abstract"&gt;
Abstract
&lt;a href="#abstract"&gt;§&lt;/a&gt;
&lt;/h3&gt;
&lt;p&gt;Machine Learning models based on Deep Neural Networks behave unpredictably when presented with inputs that do not stem from the training distribution and sometimes make egregiously wrong predictions with high confidence. This property undermines the trustworthiness of systems depending on such models and potentially threatens the safety of their users. Out-of-distribution (OOD) detection mechanisms can be used to prevent errors by detecting inputs that are so dissimilar from the training set that the model can not be expected to make reliable predictions. In this paper, we present PyTorch-OOD, a Python library for OOD detection based on PyTorch. Its primary goals are to accelerate OOD detection research and improve the reproducibility and comparability of experiments. PyTorch-OOD provides well-tested and documented implementations of OOD detection methods with a unified interface, as well as training and benchmark datasets, architectures, pre-trained models, and utility functions. The library is available online under the permissive Apache 2.0 license and can be installed via the Python Package Index (PyPI).&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our paper, <strong>PyTorch-OOD: A library for Out-of-Distribution Detection based on PyTorch</strong>, has been presented at the CVPR 2022 Workshops.
You can find the most recent version of the Python source code on <a href="https://github.com/kkirchheim/pytorch-ood">GitHub</a>.</p>
<h3 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h3>

<p>Machine Learning models based on Deep Neural Networks behave unpredictably when presented with inputs that do not stem from the training distribution and sometimes make egregiously wrong predictions with high confidence. This property undermines the trustworthiness of systems depending on such models and potentially threatens the safety of their users. Out-of-distribution (OOD) detection mechanisms can be used to prevent errors by detecting inputs that are so dissimilar from the training set that the model can not be expected to make reliable predictions. In this paper, we present PyTorch-OOD, a Python library for OOD detection based on PyTorch. Its primary goals are to accelerate OOD detection research and improve the reproducibility and comparability of experiments. PyTorch-OOD provides well-tested and documented implementations of OOD detection methods with a unified interface, as well as training and benchmark datasets, architectures, pre-trained models, and utility functions. The library is available online under the permissive Apache 2.0 license and can be installed via the Python Package Index (PyPI).</p>
<h3 id="installation">
  Installation
  <a href="#installation">§</a>
</h3>

<p>You can install the package directly via pip:</p>
<div class="highlight"><pre tabindex="0" style="color:#d8dee9;background-color:#2e3440;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>pip install pytorch-ood
</span></span></code></pre></div><h3 id="presentation">
  Presentation
  <a href="#presentation">§</a>
</h3>

<p>The presentation slides are available <a href="/pdf/pytorch-ood-presentation.pdf">here</a>.</p>
]]></content:encoded></item><item><title>Addressing Randomness in Evaluation Protocols for Out-of-Distribution Detection</title><link>https://www.kkirchheim.de/papers/randomness-in-ood/</link><pubDate>Tue, 13 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/randomness-in-ood/</guid><description>&lt;p&gt;Our Paper &lt;strong&gt;Addressing Randomness in Evaluation Protocols for Out-of-Distribution Detection&lt;/strong&gt; has been accepted at the &lt;a href="https://sites.google.com/view/ai4an2021"&gt;ICJAI 2021 Workshop&lt;/a&gt; for &lt;em&gt;Artificial Intelligence for Anomalies and Novelties&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In summary, we investigated the following phenomenon:
when you train neural networks several times, and then measure their performance on some task, there is a certain variance in the performance measurements, since the results of experiments may vary based on several factors (that are effectively controlled by the random seed).
We investigated how the performance measures for several evaluation protocols used in Anomaly Detection, Out-of-Distribution Detection, Open Set Recognition (OSR) and related fields vary when the random seed is varied.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Our Paper <strong>Addressing Randomness in Evaluation Protocols for Out-of-Distribution Detection</strong> has been accepted at the <a href="https://sites.google.com/view/ai4an2021">ICJAI 2021 Workshop</a> for <em>Artificial Intelligence for Anomalies and Novelties</em>.</p>
<p>In summary, we investigated the following phenomenon:
when you train neural networks several times, and then measure their performance on some task, there is a certain variance in the performance measurements, since the results of experiments may vary based on several factors (that are effectively controlled by the random seed).
We investigated how the performance measures for several evaluation protocols used in Anomaly Detection, Out-of-Distribution Detection, Open Set Recognition (OSR) and  related fields vary when the random seed is varied.</p>
<p>In some of these fields, like OSR, it is common to measure the average performance over 3-5 experiments. Is this sufficient to draw reliable conclusions regarding a possible performance difference between methods?</p>
<p>We found that the variance is so large that it may, in fact, not.
Consequentially, experiments based on too few random seed might provide a brittle foundation for conclusions.
We the argue that such experiments should rather be seen as a fundamentally random process.
Therefore, we should measure the expected value of the performance
$\mathbb{E}_{x \sim p} [ f(x) ] $
where $p$ is the distribution of the random seeds and $f$ is an experimental setting.</p>
<p>Given a set of measurements, we can use statistical tests to determine if an observed difference can be considered significant.
However, we found that in some cases even 1000 experiments were insufficient to infer significant differences in the results.</p>
]]></content:encoded></item><item><title>Data-Mining als Werkzeug empirischer Sozialforschung</title><link>https://www.kkirchheim.de/papers/socialnet/</link><pubDate>Mon, 13 Jul 2020 22:09:10 +0200</pubDate><guid>https://www.kkirchheim.de/papers/socialnet/</guid><description>&lt;p&gt;Inspired by David Kriesel&amp;rsquo;s talk &amp;ldquo;&lt;a href="https://www.youtube.com/watch?v=-YpwsdRKt8Q"&gt;Spiegel-Mining&lt;/a&gt;&amp;rdquo;, a friend of mine and a professor from the Hochschule Magdeburg scraped a German website that regularly publishes reviews of social work literature and mined the resulting 18.000 articles, hoping for interesting insights.&lt;/p&gt;
&lt;p&gt;In an attempt to visualize the discourse, we created several topic maps, like the one below, which you can find on the accompanying (German) &lt;a href="https://extra-mining.de/"&gt;website&lt;/a&gt;.
The colors represent the gender of the authors of the review.
Note that we are not entirely sure if the editors or the authors are responsible for this gender assignment.
Also, the explicit gender assignment was removed and can not be found on the scraped website anymore.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Inspired by David Kriesel&rsquo;s talk &ldquo;<a href="https://www.youtube.com/watch?v=-YpwsdRKt8Q">Spiegel-Mining</a>&rdquo;, a friend of mine and a professor from the Hochschule Magdeburg scraped a German website that regularly publishes reviews of social work literature and mined the resulting 18.000 articles, hoping for interesting insights.</p>
<p>In an attempt to visualize the discourse, we created several topic maps, like the one below, which you can find on the accompanying (German) <a href="https://extra-mining.de/">website</a>.
The colors represent the gender of the authors of the review.
Note that we are not entirely sure if the editors or the authors are responsible for this gender assignment.
Also, the explicit gender assignment was removed and can not be found on the scraped website anymore.</p>
<figure><img src="/img/socialnet/graph_altenpflege_thumb_256x256.webp"
    alt="Section of the concept web, visualized with Gephi" width="256px"><figcaption>
      <p>Section of the concept web, visualized with Gephi</p>
    </figcaption>
</figure>

<p>The findings were not surprising: people whom the website identified as women tend to write reviews on topics one could consider traditional female-dominated fields, like child care.</p>
]]></content:encoded></item><item><title>Explanation-based Anomaly Detection in Deep Neural Networks</title><link>https://www.kkirchheim.de/papers/masters-thesis/</link><pubDate>Sat, 01 Feb 2020 00:00:00 +0000</pubDate><guid>https://www.kkirchheim.de/papers/masters-thesis/</guid><description>&lt;figure class="figure-right"&gt;&lt;a href="https://www.kkirchheim.de/pdf/masters-thesis.pdf"&gt;&lt;img src="https://www.kkirchheim.de/img/thumbs/masters-thesis.webp"
alt="Masters Thesis (PDF)." width="100px"&gt;&lt;/a&gt;&lt;figcaption&gt;
&lt;p&gt;Masters Thesis (PDF).&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If an AI gives you a weird explanation for its prediction, you should remain septical about the accuracy of the prediction. Sounds reasonable?&lt;/p&gt;
&lt;p&gt;This was the general idea of my masters thesis, which was originally titled &lt;strong&gt;Self-Assessment of Visual Recognition Systems based on Attribution&lt;/strong&gt;.
Today, I would call it &lt;strong&gt;Explanation-based Anomaly Detection in Deep Neural Networks&lt;/strong&gt;.
The general idea was to use attribution-based explanation methods to detect anomalies (such as unusual inputs) in convolutional neural networks.
This basically boils down to detecting unusual gradients in the network, which, at the time, was, to my knowledge, a novel idea.
We did some experiments and found that it somewhat worked in some cases.&lt;/p&gt;</description><content:encoded><![CDATA[<figure class="figure-right"><a href="/pdf/masters-thesis.pdf"><img src="/img/thumbs/masters-thesis.webp"
    alt="Masters Thesis (PDF)." width="100px"></a><figcaption>
      <p>Masters Thesis (PDF).</p>
    </figcaption>
</figure>

<p>If an AI gives you a weird explanation for its prediction, you should remain septical about the accuracy of the prediction. Sounds reasonable?</p>
<p>This was the general idea of my masters thesis, which was originally titled <strong>Self-Assessment of Visual Recognition Systems based on Attribution</strong>.
Today, I would call it <strong>Explanation-based Anomaly Detection in Deep Neural Networks</strong>.
The general idea was to use attribution-based explanation methods to detect anomalies (such as unusual inputs) in convolutional neural networks.
This basically boils down to detecting unusual gradients in the network, which, at the time, was, to my knowledge, a novel idea.
We did some experiments and found that it somewhat worked in some cases.</p>
<h2 id="abstract">
  Abstract
  <a href="#abstract">§</a>
</h2>

<p>Convolutional Neural Networks (CNNs) achieve state of the art results in various visual
recognition tasks like object classification and object detection. While CNNs
perform surprisingly well, it is difficult to retrace why they arrive at a certain
prediction. Additionally, they have been shown to be prone to certain errors.
As CNN are increasingly deployed into physical systems - for example in self
driving vehicles - undetected errors could result in catastrophic consequences.
Approaches to prevent this include the usage of attribution based explanation
methods to facilitate an understanding in the systems decision in hindsight, as
well as the detection of recognition errors at runtime, called self-assessment.
Some state-of-the-art self-assessment approaches aim to detect anomalies in the
activation patterns of neurons in a CNN.</p>
<p>This work explores the usage of attribution based explanations for self-assessment of CNNs. We build multiple self-assessment models and evaluate
their performance in various settings. In our experiments, we find that, while
self-assessment based on attribution does not outperform self-assessment based
on neural activity on its own, it always surpasses random guessing. Furthermore, we find that self-assessment models using neural activation patterns as
well as neural attribution can in some cases outperform models which do not
consider attribution patterns. Thus, we conclude that it might be possible to
improve self-assessment models by including the explanation of the model into
the assessment-process.</p>
]]></content:encoded></item></channel></rss>