转载

如何实现自动缓存文章的图片？

前沿：我相信大家多多少少应该发过一些文章，或者写过一些博客。对于文章的图片各个平台是如何处理的呢。如：QQ空间会把所有文章的图片缓存到qq服务器上。那么如何实现这一功能呢？大家有没有思考过？

好！下面我们带着问题来思考这个问题“如何实现自动缓存文章的图片？” 首先，自动就必须无人工去干预，必须程序自动检测文章是否包含图片。其次，缓存图片那就必须下载图片，下载图片如何防止爬虫！带着这个思路我们把整理称如下步骤

1.程序定时跑任务，如每30秒扫面一下文章里面是否又需要替换的图片（非本站图片） 2.解析文章里面的图片url（需要用到正则或者html解析工具，如jsoup） 3.自动下载到本地服务器上(需要网络连接相关工具包，对于某些网站，可能需要用过代理) 4.替换原文章里面的url（正则替换或者字符串替换）

1.程序定时跑任务

 <bean id="downloadImageTime" class="org.springframework.scheduling.quartz.CronTriggerBean">
 <property name="jobDetail" ref="downloadImageCron" />
 <property name="cronExpression" value="0/30 * * * * ?" /> 
 </bean>

2.解析文章里面的图片url

 public void addImageUrl(Document doc,BsArticle bsarticle ){
 Elements elements = doc.select("div.article_body").select("img");
 for(Element element:elements){
 String imageUrl =element.attr("src");
 BsReplaceimage bsReplaceimage = new BsReplaceimage();
 bsReplaceimage.setOldImageurl(processsImage(imageUrl));
 bsReplaceimage.setProcessFlag(0);
 bsReplaceimage.setArticleId(bsarticle.getId());
 try {
 replaceImageService.addReplaceImage(bsReplaceimage);
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
 }

3.自动下载到本地服务器上

/** * 下载图片 * 把从服务器获得图片的输入流InputStream写到本地磁盘 * @Method_Name : downImage * @param urlString * @param filename * @param savePath * @return : void * @Creation Date : 2016年7月8日上午11:06:25 * @version : v1.00 * @Author : liuhaihua * @Update Date : * @Update Author : */ public static String downImage(String oldurlString, String filename,String savePath) { String newurlString =""; InputStream inputStream = getInputStream(oldurlString); byte[] data = new byte[1024]; int len = 0; FileOutputStream fileOutputStream = null; try { String absolute_path ="/alidata1/ftp"+savePath; File sf=new File(absolute_path); if(!sf.exists()){ sf.mkdirs(); } newurlString =savePath+"/"+filename+"."+parseSuffix(oldurlString); fileOutputStream = new FileOutputStream(sf.getPath()+"/"+filename+"."+parseSuffix(oldurlString)); while ((len = inputStream.read(data)) != -1) { fileOutputStream.write(data, 0, len); } } catch (IOException e) { e.printStackTrace(); } finally { if (inputStream != null) { try { inputStream.close(); } catch (IOException e) { e.printStackTrace(); } } if (fileOutputStream != null) { try { fileOutputStream.close(); } catch (IOException e) { e.printStackTrace(); } } } return newurlString; } /** * 从服务器获得一个输入流(本例是指从服务器获得一个image输入流) * @Method_Name : getInputStream * @param URL_PATH * @return * @return : InputStream * @Creation Date : 2016年7月8日上午11:07:00 * @version : v1.00 * @Author : liuhaihua * @Update Date : * @Update Author : */ public static InputStream getInputStream(String URL_PATH) { InputStream inputStream = null; HttpURLConnection httpURLConnection = null; try { URL url = new URL(URL_PATH); httpURLConnection = (HttpURLConnection) url.openConnection(proxy); // 设置网络连接超时时间 httpURLConnection.setConnectTimeout(3000); // 设置应用程序要从网络连接读取数据 httpURLConnection.setDoInput(true); httpURLConnection.setRequestMethod("GET"); int responseCode = httpURLConnection.getResponseCode(); if (responseCode == 200) { // 从服务器返回一个输入流 inputStream = httpURLConnection.getInputStream(); } } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return inputStream; }

4.替换原文章里面的url

System.out.println("oldurl--------------------:"+oldurl); System.out.println("newurl--------------------:"+newurl); String newcontent = content.replace(oldurl, newurl);

正文到此结束