[HN Gopher] Forking Chrome to turn HTML into SVG
       ___________________________________________________________________
        
       Forking Chrome to turn HTML into SVG
        
       Author : fathyb
       Score  : 222 points
       Date   : 2022-11-13 17:06 UTC (5 hours ago)
        
 (HTM) web link (fathy.fr)
 (TXT) w3m dump (fathy.fr)
        
       | lifthrasiir wrote:
       | > What if we could also vectorize 2D <canvas> elements controlled
       | by JavaScript? Turns out, Chromium has this capability built-in
       | for printing:
       | 
       | I'm very surprised to hear this. So printing, either to PDF or to
       | actual printers, may reveal more information about what was drawn
       | to the canvas than normal display, especially if no effort has
       | been made to remove overdrawn paint records. That can have an
       | interesting, if only hypothetical, consequence...
        
         | tyingq wrote:
         | Pretty sure canvas.toDataURL() is or was a fingerprinting
         | method.
        
         | andybak wrote:
         | Being fair, if anyone is sending anything to the client and
         | assumes it's not visible then they are fair game.
         | 
         | I just hope the code around password entry fields is carefully
         | audited. That's all on the client.
        
           | kevincox wrote:
           | Yes, but there is more than that there. What if I as the
           | client try to print a page or export to PDF. I think that
           | there is nothing sensitive visible on the page so I share the
           | result. It turns out that there was actually sensitive info
           | in the canvas that was not visible due to something like
           | overdraw.
           | 
           | As a simple example imagine that an image is drawn to the
           | canvas and then blacked out. You wouldn't expect that the
           | saved PDF may contain those as separate layers.
           | 
           | Of course this highlights an existing issue with complex
           | formats. You need to be very careful before sharing complex
           | documents.
        
           | version_five wrote:
           | There's a story from years ago (I couldn't find it) about
           | some government or legal documents having info redacted, but
           | whoever did it just used some pdf editing tool to draw black
           | boxes over the redacted parts, so all the info was still in
           | the pdfs.
           | 
           | Edit: I found this but I'm not sure it's the one I'm
           | remembering: https://www.techdirt.com/2014/01/28/new-york-
           | times-suffers-r...
        
             | perth wrote:
             | This happened with the Maxwell redacted court docs
        
         | mk_stjames wrote:
         | This is exactly the case. I've done conversions before where it
         | was possible to see and extract underlying, hidden elements,
         | that were not visible or even detectable in the rendered
         | webpage in a browser.
         | 
         | This is actually a somewhat common method when it comes to a
         | bit of corporate sleuthing.. anytime you see a pretty website
         | with vector-y graphics, maybe engineering-drawing
         | representations.. if the data hasn't been stripped completely
         | or redrawn you can extract information that otherwise people
         | would assume unknowable.
         | 
         | In a recent example... I did this on a startup company's page
         | involving a product where they had a CAD-like side view drawing
         | of one of their products... but the base file (in this case it
         | was an SVG) driving the page actually contained multiple hidden
         | views of the same product and other products and at the 'real'
         | precision of what likely was a DXF export from a CAD program,
         | given to the web team. This allowed a critical dimension of an
         | unannounced product to be precisely determined (to three
         | significant figures) which was a spec that had not been
         | publicly released...
        
         | aidos wrote:
         | Totally. We see architectural drawings that go through a number
         | of revisions and it's not uncommon for designers to simply
         | cover a whole section with a white box and then draw on top of
         | it.
         | 
         | Also within PDFs (and svgs) you normally clip the area you're
         | going to draw into to bound it (sort of like overflow:hidden)
         | and anything outside of that doesn't display, but it's still
         | there and accessible.
         | 
         | I marvel more at the fact that software is capable of figuring
         | out all the occlusions so you can print the stuff on a plotter.
         | Cad drawings have up to 2M individual vectors in them. Its
         | impressive that it works at all to be honest.
        
         | [deleted]
        
       | steren wrote:
       | Much cleaner than my hack of Chrome -> PDF -> Inkscape -> SVG
       | 
       | https://labs.steren.fr/2020/05/08/screenshot-as-svg/
        
         | cjr wrote:
         | ha, I'm also guilty of using this method on https://urlbox.io
         | to power our SVG screenshots.
         | 
         | To be honest, it works quite well, but there are quite a few
         | bugs in chromium's pdf rendering, especially when it comes to
         | determining the correct page width to apply media queries to,
         | which sometimes affects the accuracy of these SVG's.
        
       | SigmundA wrote:
       | Reminds me of https://github.com/gliffy/canvas2svg at a different
       | level of abstraction.
       | 
       | I believe PDF.js incorporated some form canvas2svg to try and get
       | a SVG backend working which would allow high resolution printing
       | to PDF but not sure where that's at. I believe printing through
       | PDF.js is blurry due to memory constraints since with normal
       | canvas pdf pages just end up as bitmaps sent to the printer.
       | 
       | SVG ends up staying vector through Chromiums print pipeline
       | resulting in much less memory usage while having much higher dpi
       | final output. I would imagine this is due to SVG being turned
       | into Skia drawing commands that end up as PDF that then gets
       | printed through PDFium?
        
       | pornel wrote:
       | It'd be wonderful if this could be integrated into the browser
       | and the OS to provide SVG screenshots.
        
       | hedora wrote:
       | I'd love to see some sort of caching proxy that did this for news
       | stories, etc.
       | 
       | Basically, convert everything to an archival format, then I'll
       | browse the archive instead of whatever adversarial server side /
       | javascript junk the site is serving.
        
         | bawolff wrote:
         | Well both pdf and svg support javascript (albeit pdf is
         | extremely limited)
        
         | ccouzens wrote:
         | If you could proxy the page to SVG without Javascript, couldn't
         | you also proxy the page to HTML without Javascript?
         | 
         | Either way, you'd probably want your proxy to wait to for any
         | onload Javascript to run before snapshotting the page.
        
       | metayrnc wrote:
       | Can someone give some example usecases? I am curious as to how
       | this is used. Thank you.
        
         | commotionfever wrote:
         | if it works how i think it does, this could be really nice to
         | cook up some infographics in a css framework like tailwind.
         | then make some svgs for a github readme
         | 
         | for example i made this one[1] with tailwind but i just ended
         | up taking a png screenshot
         | 
         | [1]
         | https://github.com/sentriz/socr/blob/master/.github/socr.png
        
         | danielvaughn wrote:
         | IMO the use case is limited but interesting. The most obvious
         | would be product screenshots for landing pages, although
         | typically design tools handle that well enough.
         | 
         | I'm currently building a web-app for building web pages, and
         | I'd love for the user to be able to view a thumbnail gallery of
         | all the pages they've built. This tool would allow me to build
         | a zooming feature pretty easily.
         | 
         | Outside of those two, I'd imagine the use cases are fairly
         | limited.
        
           | btown wrote:
           | The thing about having access to the Skia render graph is
           | that all of a sudden you're no longer limited to product
           | screenshots and screen recordings. Imagine a pipeline where
           | you can export someone's interaction session with a site,
           | pixel-perfect, into DaVinci Resolve or Blender or Unity as a
           | fully annotated DOM-advised render node hierarchy, with
           | consistent node identities over time, of every rendered
           | element on the page as it changes across frames. That's _way_
           | more powerful than just pixels.
           | 
           | Imagine flying through your site in 3D (or even VR) with full
           | control over timing, being able to explode and un-explode
           | your DOM elements as they transition into being - the type of
           | thing that only Apple would do for their WWDC demos with
           | dedicated visualization teams.
           | 
           | The start is to be able to see the rendering engine as a
           | generator for not just raster data over time, but vector data
           | over time. Of course, there's a lot of work to do from there,
           | but this is the core leap.
        
         | Scalene2 wrote:
         | Great for screenshots to render in a video.
        
         | convolvatron wrote:
         | if we can reduce the size of the basis footprint for a browser
         | implementation, we can more easily produce new browsers (i.e by
         | implementing a fully general Path, and font rendering)
        
           | mrkramer wrote:
           | That would be cool if actually converting HTML to SVG would
           | save you bandwidth and all the rest that goes with web
           | requests. Imagine a web browser that only supports SVG and
           | converts all HTML to SVG then when browsing the web you would
           | only look at screenshots of websites and webpages. This would
           | be something like read-only browser. It is already
           | possible[0] tho but it is not enabled by default on Chrome
           | nor it is exclusive feature.
           | 
           | [0] https://frankgroeneveld.nl/2021/08/24/most-underused-
           | browser...
        
         | GranPC wrote:
         | I do something similar - but using the Print command and
         | converting the PDF to SVG - to import websites into Blender for
         | flashy animations. This allows me to neatly animate things
         | in/out, and zoom into details without pixelation.
        
         | yvoschaap wrote:
         | I tried something similar like this to render thumbnail of
         | websites (at a very small file-size). E.g.
         | https://twitter.com/yvoschaap/status/1446397003316047872
        
           | mrkramer wrote:
           | Isn't this something like archive.ph is doing? Snapshotting
           | and screenshotting websites. I'm referring both to you and
           | the op.
        
             | dj_gitmo wrote:
             | https://archive.ph/1NNZr
             | 
             | That looks like a Web Archive (WARC) and a PNG screen shot.
             | I think you can make a screenshot with CasperJS. The WARC
             | can be created by wget.
        
               | mrkramer wrote:
               | Yea you are right about PNG but wrong about WARC.
               | Archive.ph doesn't use WARC.
        
               | dredmorbius wrote:
               | What does it use, if you know?
               | 
               | Source?
        
               | mrkramer wrote:
               | Their FAQ says: https://archive.ph/faq#:~:text=Which%20pa
               | rts%20of,of%201024x....
               | 
               | Wikipedia says: https://en.wikipedia.org/wiki/Archive.tod
               | ay#:~:text=Web%20pa....
               | 
               | So I assumed they doesn't use it but idk for sure.
        
           | marginalia_nu wrote:
           | How small are you getting them? I'm straight up
           | screenshotting websites (e.g.
           | https://search.marginalia.nu/screenshot/245804). Seem to come
           | in at on average 17 Kb, based on a sample size of 550K
           | screenshots.
        
           | codetrotter wrote:
           | That is super neat! Did you end up having any
           | users/customers?
        
       | mk_stjames wrote:
       | I've done this for a project long ago, incredibly lazily, by
       | using chrome/chromium to PDF and piping to a PDF to SVG tool.
       | There are a few PDF to SVG pathways, I remember it using Cairo
       | and the whole thing was quick and consistent.
        
         | crazygringo wrote:
         | That was my first thought as well.
         | 
         | I'm genuinely curious if there are any advantages in
         | Chrome->SVG as opposed to Chrome->PDF->SVG.
         | 
         | Are there any graphical effects (e.g. produced by CSS, like
         | blurry text shadows or something) that PDF can't render without
         | falling back to bitmap but SVG can?
         | 
         | Or is there other data that SVG usefully preserves that PDF
         | discards, such as actual source text strings used for text? (As
         | opposed to PDF where getting text out, e.g. when copying to
         | clipboard, usually involves a lot of ugly "reverse
         | engineering".)
        
           | femto113 wrote:
           | I think the path is more clearly thought of as HTML+CSS ->
           | display list -> *. The display list is some abstract
           | definition of what needs to be drawn by a renderer. In theory
           | anything that fully describes all possible operations works,
           | including bespoke things like SkPicture or general purpose
           | graphical languages like SVG or PostScript. In practice
           | there's never a single language that can describe everything,
           | because display capabilities evolve and new operations are
           | added all the time (e.g. advanced typography features for
           | fonts). PDF can cover a really broad set of use cases, but it
           | also wasn't designed as an intermediate format (it was
           | closely tied to the PDF reader) so it's easier to get into
           | than out of. SVG is possibly a better candidate, as it is
           | already used effectively as an intermediate representation
           | (e.g. D3.js "renders" to SVG).
        
           | aidos wrote:
           | Neither pdf or svg do text layout. They're both pretty
           | similar really, though the pdf spec is really deep and broad
           | to cater for a million things.
           | 
           | My advice to everyone re pdfs is to crack them open by
           | running `mutool clean -d file.pdf` And opening in a text
           | editor. They're just a tree (well, graph, I guess) of obvious
           | objects.
           | 
           | Ps: mutool convert does a good job of converting from pdf to
           | svg in a fairly faithful way.
        
             | DrewADesign wrote:
             | Do you mean there's no _dynamic_ text layout? Svg and pdf
             | have perfect text placement capability, but I 've never
             | even looked to see if it supports defining broadly
             | applicable rules for text presentation.
        
               | aidos wrote:
               | I mean there's no layout engine to do things like
               | wrapping and line height etc. Everything is explicitly
               | positioned.
               | 
               | PDF seems a bit more bonkers because you render text as
               | strings of glyphs and the conversion back to text is an
               | afterthought. There's a ToUnicode map that says "glyph 8
               | in the embedded font is an 'X'" but that's there for copy
               | pasting / searching - not for rendering. PDFs are built
               | to render glyphs at positions.
               | 
               | Edit: to go full meta, there are Type3 fonts where each
               | glyph itself is defined as a PDF graphics stream. Which
               | actually leads you in to what's inside a font. Guess
               | what? lots of them look just like PDFs inside, because
               | the glyphs are defined in postscript. Fonts are PDFs
               | kinda grew up together, and once you start digging into
               | them the similarities are striking.
        
           | ccouzens wrote:
           | If you print to PDF, you'll have the page's print css
           | applied. And it is probably paginated.
           | 
           | If you go direct to SVG the capture will use the screen css
           | and not be paginated.
        
             | mk_stjames wrote:
             | Yes, this. In the project I was doing, using chromium as a
             | command-line interface I remember having options to do the
             | pagination to a custom resolution, which I used to define a
             | render 'window' as if the browser screen was on something
             | like a 1600x18000 monitor. so I had the entire webpage
             | displayed like a full scroll without page breaking like it
             | would have if you just printed a PDF from Chrome. And this
             | allowed me to then extract this giant full length vector
             | graphics result of diagrams and text into a single SVG that
             | was perfectly spaced and rendered in the aspect ratio I
             | wanted.
        
             | cjr wrote:
             | It's also possible to emulate screen media queries[0] so
             | that the pdf output uses the regular screen css.
             | 
             | [0] https://chromedevtools.github.io/devtools-
             | protocol/tot/Emula...
        
       | aidos wrote:
       | I've been down a bit of this rabbit hole before. We work with
       | PDFs, svgs, fonts and chromium too. While I don't have any need
       | for this tool itself, I'd highly recommend flicking through this
       | article as a nice overview of the graphics / font pipeline.
        
       | bscphil wrote:
       | Semi-related, there's this browser extension that somehow manages
       | to mangle HTML into SVG with pretty good accuracy.
       | https://addons.mozilla.org/en-US/firefox/addon/svg-screensho...
       | 
       | I do stuff like this (vector representations of the DOM) for
       | taking screenshots. Why?
       | 
       | 1. High resolution screenshots are great when you're sharing from
       | a low resolution device, or when you need to scale them up. I've
       | seen enough crappy screenshots of Twitter in YouTube videos to
       | last me the rest of my life.
       | 
       | 2. If your device does sub-pixel anti-aliasing, then your
       | screenshots all have noticeable color fringing around their text.
       | The text rendering is done well before the data hits the buffer
       | that the screenshot is capturing. A fun party trick is to
       | identify someone's OS based purely on a screenshot of some text
       | on a webpage.
       | 
       | 3. On Linux (and maybe elsewhere, IDK), color correction (e.g.
       | gamut mapping) is done (in X11) before the pixels get to the
       | buffer that you capture. So with most screenshot tools, you end
       | up capturing a bunch of distorted colors which you then have to
       | map back to sRGB if you want them to look right in color
       | calibrated software.
       | 
       | You can frequently get away with printing a PDF and then
       | rendering that out to a large PNG. In some cases, though,
       | figuring out how to set the page size to match what you seen on
       | the screen can be near-impossible, and more importantly in
       | Firefox there's no way to disable print media CSS when printing a
       | PDF. (You can do this in Chromium.) If you need to edit the image
       | afterwards or want to put it on a website or something, this is
       | far easier to do with the SVG format than with PDF.
        
       | jancsika wrote:
       | > Recently, an experimental SVG back-end has been added to Skia.
       | 
       | That's curious.
       | 
       | Anyone know why?
        
         | return_to_monke wrote:
         | While I am not a skia person, an use case I could imagine is
         | (flutter) web apps.
         | 
         | Flutter currently has 2 ways to run something on the web: 1.
         | CanvasKit. Primarily, this uses webgl. Though, the app has to
         | download a kind of webGl runtime on the first launch, iirc. If
         | the browser does not support openGl, it will use Skia with a
         | Canvas frontend, leading to blurry and poor performance results
         | 2. webRender. This is flutter's way of trying to make a HTML
         | DOM, but its not that great either. It's inconsistent with the
         | rest of the flutter implementations, and has performance issues
         | because it's not really mature/optimized and has a virtual Dom.
         | 
         | I think an exciting use case would be something like 1. Instead
         | of the blurry image and bad performance of canvas redrawing, it
         | might try to manipulate an SVG in the browser. This is pure
         | speculation tho, correct me if I'm wrong.
        
         | TheRealPomax wrote:
         | Calling it "recently" is a bit of a misnomer. The
         | "experimental/svg/model/..." content was added almost five
         | years ago.
        
       | simpleintheory wrote:
       | Interesting. Wonder how easily it would be to generalise this--
       | turn into an API that gives out some image data that could be in
       | turn converted to PDF, SVG, PNG, you name it... though not sure
       | how the data would be structured though
        
         | imhoguy wrote:
         | You can make PDF or PNG from SVG.
        
         | fathyb wrote:
         | I had a lot of people reach out this week-end for PDF support,
         | so I'm planning on implementing it with PNG support this week.
         | Thanks to Skia, it should just require a few lines of code.
        
           | justinclift wrote:
           | PDF or PNG? It's not clear from your comment. :)
        
       ___________________________________________________________________
       (page generated 2022-11-13 23:00 UTC)