from Hacker News

Hard problems in social media archiving

by surprisetalk on 2/12/26, 3:19 PM with 11 comments

  • by pjc50 on 2/16/26, 2:17 PM

    Good that they actually raise the question of users not wanting to be archived. I think the semi-ephemerality of channel based systems like Discord is increasingly popular partly because of various sorts of "cancel wars", well- or ill-intentioned capture and use of posts out of context.
  • by zoobab on 2/16/26, 10:27 AM

    HTTP is not designed for mirroring.

    FTP was easy to mirror with "lftp> mirror -p".

    Easy mirroring and archive level maintenance (let's say the network always maintain 3 copies at minimum) should be built-in the "social media" protocols.

  • by garethsprice on 2/16/26, 5:00 PM

    Would it make sense to archive every word every person ever speaks? At what point does archiving everything people do constrain their ability to live freely in the present?

    Despite being in written form (decreasingly so), social media feels more like a private conversation in a public space - and like all such conversations, it deserves the right to decay, so that we do not all become prisoners of the dumbest thing we ever said.

    The transformative work of curation - choosing which pieces to save, to turn into books, diary entries, or blog posts that record context for posterity - is a valid part of how archivists build the corpus of history. Harvesting all the raw data simply because we can is a dangerous road.

  • by drnick1 on 2/16/26, 5:56 PM

    Why would you want to archive social media? It's always been slop and now it's increasingly AI slop.

    We should really to back to people hosting their own websites when they want to share something publicly. Just plain HTML like in 1995.

  • by binarykult on 2/16/26, 11:36 AM

    Well if only we still can archive Instagram full-profiles, for example ...
  • by CGMthrowaway on 2/16/26, 4:43 PM

    TLDR The actual (formally) hard problems:

      Defining archive boundaries in a dense social graph (graph traversal + stopping criteria without exploding scope)
      Entity resolution across pseudonymous accounts 
      Reconstructing opaque ranking algorithms from outputs