<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://om2005prakash.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://om2005prakash.github.io/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-01-30T17:31:00+00:00</updated><id>https://om2005prakash.github.io/feed.xml</id><title type="html">Om Prakash</title><subtitle>My Portfolio</subtitle><author><name>Om Prakash</name></author><entry><title type="html">Editing Concepts in Diffusion Models</title><link href="https://om2005prakash.github.io/Unlearning/" rel="alternate" type="text/html" title="Editing Concepts in Diffusion Models" /><published>2025-08-15T00:00:00+00:00</published><updated>2025-08-15T00:00:00+00:00</updated><id>https://om2005prakash.github.io/Unlearning</id><content type="html" xml:base="https://om2005prakash.github.io/Unlearning/"><![CDATA[<p><img src="https://raw.githubusercontent.com/Om2005Prakash/Editing-Concepts-in-Stable-Diffusion/refs/heads/main/assets/results.png" alt="CLIP Editing Results" /></p>

<blockquote>
  <p><strong>Note:</strong> This is a solution write-up for our submission to the <strong>Unlearning and Model Editing (U&amp;ME) Workshop at ICCV ‘25</strong>.</p>
  <ul>
    <li><a href="https://sites.google.com/view/u-and-me-workshop/">Workshop Website</a></li>
    <li><a href="https://shreyanshhub.github.io/GENMU-/leaderboard.html">Current Leaderboard</a></li>
  </ul>
</blockquote>

<p><em>(For the technical solution, skip to the section “Null-Space Constrained Editing”)</em></p>

<h2 id="motivation-for-unlearning">Motivation for Unlearning</h2>

<p>Over the past few months, I’ve been exploring a deceptively simple question:
<strong>Can a diffusion model forget — without losing everything else it knows?</strong></p>

<p>As text-to-image models become increasingly capable, they also inherit the biases, copyrighted material, and harmful content of their massive training datasets. From <em>LAION-5B</em> to other web-scale corpora powering today’s generative systems, this raises a critical issue: <em>how</em> can we make them unlearn responsibly?</p>

<p>Editing can serve as a form of unlearning. If we can edit an unwanted concept into a safe target concept, we can effectively “forget” the original one. This post describes a method I call <strong>Null-Space Constrained Concept Editing</strong>, adapted from a paper called <strong>AlphaEdit</strong> (<a href="https://arxiv.org/abs/2410.02355">arXiv:2410.02355</a>), which enables <em>selective editing</em> in diffusion models while preserving unrelated knowledge.</p>

<p>In this post, we focus on editing the <strong>CLIP text encoder</strong> to realign the embeddings of unwanted concepts toward safe targets. Crucially, we do <strong>not</strong> modify the UNet weights directly; however, we leverage the UNet to guide the update direction (as detailed in the section <em>Guidance From the UNet</em>).</p>

<hr />

<h2 id="the-problem-naive-editing">The Problem: Naive Editing</h2>

<p>A naive objective for editing a single linear layer \( W \) attempts to balance editing new concepts (\( K_1 \)) while preserving old ones (\( K_0 \)):</p>

\[\min_{\Delta} \left( 
\underbrace{\| (W + \Delta)K_1 - V_1 \|^2}_{\text{Edit Error}} 
+ 
\underbrace{\| (W + \Delta)K_0 - V_0 \|^2}_{\text{Preservation Error}}
\right)\]

<p>Here, \( V_0 = WK_0 \) represents the original outputs we wish to maintain.</p>

<p><strong>The issue:</strong> Minimizing the edit error often requires distorting \( W \) in directions that inadvertently alter the output for \( K_0 \). This leads to the classic problem of <strong>“catastrophic forgetting.”</strong></p>

<hr />

<h2 id="the-solution-null-space-constrained-editing-alphaedit">The Solution: Null-Space Constrained Editing (AlphaEdit)</h2>

<p>A recent method, <strong>AlphaEdit</strong>, solves this geometrically. Instead of trying to balance two competing errors, it restricts updates to the <strong>null space</strong> of the preserved knowledge.</p>

<p>The core idea is to construct an update \( \Delta \) that acts <strong>only</strong> where \( K_0 \) has no presence.</p>

<h3 id="1-constructing-the-projector">1. Constructing the Projector</h3>
<p>We treat the update as a low-rank modification projected onto a specific subspace. First, we analyze the covariance of the preservation keys using SVD:</p>

\[K_0 K_0^T = U \Sigma U^T\]

<p>We select the eigenvectors in \( U \) corresponding to the smallest eigenvalues (effectively zero). Let \( \tilde{U} \) be the matrix of these “null” eigenvectors. We define our projection matrix \( P \) as:</p>

\[P = \tilde{U}\tilde{U}^T\]

<p><strong>The geometric intuition:</strong> Because \( \tilde{U} \) spans the null space of the input correlations, any vector projected by \( P \) is orthogonal to the existing knowledge \( K_0 \). Therefore:</p>

\[P K_0 \approx 0\]

<h3 id="2-the-null-space-objective">2. The Null-Space Objective</h3>
<p>We constrain our update to be of the form \( \Delta P \). Substituting this into the naive objective:</p>

\[\min_{\Delta} \left( \| (W + \Delta P)K_1 - V_1 \|^2 + \| (W + \underbrace{\Delta P)K_0}_{\approx 0} - V_0 \|^2 \right)\]

<p>Because \( P K_0 \approx 0 \), the preservation term vanishes naturally (since \( WK_0 = V_0 \)). The constraint ensures we <strong>cannot</strong> hurt the old knowledge. The problem simplifies to:</p>

\[\min_{\Delta} \| (W + \Delta P)K_1 - V_1 \|^2\]

<h3 id="3-closed-form-solution">3. Closed-Form Solution</h3>
<p>This simplified objective admits a closed-form solution for the update:</p>

\[\Delta_{\text{edit}} = R K_1^T P (K_1 K_1^T P + \lambda I)^{-1}\]

<p>Where \( R = V_1 - W K_1 \) is the residual error of the unedited model.</p>

<p>By projecting the update through \( P \), we ensure edits are applied strictly in the “empty space” of the model’s knowledge, allowing us to <strong>forget (errors) without forgetting (facts).</strong></p>

<hr />

<h2 id="guidance-from-the-unet">Guidance From the UNet</h2>

<blockquote>
  <p>How do we choose the target embeddings \( V_1 \) for the concepts we want to edit?</p>
</blockquote>

<p>A naive choice is \( V_1 = W K_1^* \), where \( K_1^* \) are embeddings of the target (replacement) concepts. However, this quickly leads to <strong>overfitting</strong> — edits perform well on training prompts but fail on unseen ones.</p>

<p>The fix came from leveraging the <strong>UNet</strong>. I designed a small refinement loop that adjusts \( V_1 \) so that the UNet’s outputs for the edited and target prompts align:</p>

\[V_1^{(t+1)} = V_1^{(t)} - \eta \nabla_{V_1} \, \text{MSE}(\text{UNet}(V_1^{(t)}), \text{UNet}(V_1^*))\]

<p>This <strong>UNet-guided refinement</strong> provides stable target representations that generalize well, even for indirect prompts that don’t explicitly mention the forgotten concept. For example, after editing “Van Gogh” $\to$ “Monet” in CLIP, the prompt <em>“starry night”</em> generated a Monet-style painting, even though “starry night” wasn’t seen during the editing process.</p>

<hr />

<h2 id="experiments">Experiments</h2>

<p>The approach was evaluated on <strong>Stable Diffusion 1.4</strong>, using 20 diverse concepts (objects, actions, and art styles).</p>

<p><strong>Experimental Setup:</strong></p>
<ul>
  <li><strong>Dataset:</strong> 20 diverse concepts.</li>
  <li><strong>Prompts:</strong> 10 prompts per concept (5 direct, 5 indirect).</li>
  <li><strong>Generation:</strong> 20 images generated per prompt.</li>
  <li><strong>Evaluation:</strong> <strong>LLaVA v1.5 (7B)</strong> was used to detect whether each image contained the target concept.</li>
</ul>

<p><strong>Metrics:</strong></p>
<ul>
  <li><strong>Forget Score:</strong> Fraction of images where the edited concept was <em>not recognized</em> by LLaVA.</li>
  <li><strong>Retain Score:</strong> Fraction of images where unedited concepts were <em>correctly recognized</em>.</li>
</ul>

<p>The results show that our method improves the <em>forget–retain tradeoff</em> by roughly <strong>20%</strong> compared to prior baselines, with most concepts unlearned using only <strong>one edit</strong> in CLIP.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Method</th>
      <th style="text-align: center">Harmonic Mean (Forget $\times$ Retain)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Unedited Model</td>
      <td style="text-align: center">0.313</td>
    </tr>
    <tr>
      <td style="text-align: left">Unified Concept Editing (UCE)</td>
      <td style="text-align: center">0.480</td>
    </tr>
    <tr>
      <td style="text-align: left">Erasing Stable Diffusion (ESD)</td>
      <td style="text-align: center">0.504</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Our Method (Null-Space Editing)</strong></td>
      <td style="text-align: center"><strong>0.642</strong></td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="further-reading">Further Reading</h2>

<p>If you’d like to dive deeper, the full code and a detailed report of experiments are available on GitHub:</p>

<p><a href="https://github.com/Om2005Prakash/Editing-Concepts-in-Stable-Diffusion"><img src="https://img.shields.io/badge/GitHub-View_Code-black?style=for-the-badge&amp;logo=github" alt="GitHub" /></a></p>]]></content><author><name>Om Prakash</name></author><category term="media" /><summary type="html"><![CDATA[]]></summary></entry></feed>