网站架构之分库设计

2019-03-27 01:14|来源: 网路

转自http://www.infoq.com/cn/articles/yupoo-partition-database

又拍网是一个照片分享社区,从2005年6月至今积累了260万用户,1.1亿张照片,目前的日访问量为200多万。5年的发展历程里经历过许多起伏,也积累了一些经验,在这篇文章里,我要介绍一些我们在技术上的积累。

又拍网和大多数Web2.0站点一样,构建于大量开源软件之上,包括MySQLPHPnginxPythonmemcachedredisSolrHadoopRabbitMQ等等。又拍网的服务器端开发语言主要是PHPPython,其中PHP用于编写Web逻辑(通过HTTP和用户直接打交道), 而Python则主要用于开发内部服务和后台任务。在客户端则使用了大量的Javascript, 这里要感谢一下MooTools这个JS框架,它使得我们很享受前端开发过程。 另外,我们把图片处理过程从PHP进程里独立出来变成一个服务。这个服务基于nginx,但是是作为nginx的一个模块而开放REST API。

开发语言

图1:开发语言

由于PHP的单线程模型,我们把耗时较久的运算和I/O操作从HTTP请求周期中分离出来, 交给由Python实现的任务进程来完成,以保证请求响应速度。这些任务主要包括:邮件发送、数据索引、数据聚合和好友动态推送(稍候会有介绍)等等。通常这些任务由用户触发,并且,用户的一个行为可能会触发多种任务的执行。 比如,用户上传了一张新的照片,我们需要更新索引,也需要向他的朋友推送一条新的动态。PHP通过消息队列(我们用的是RabbitMQ)来触发任务执行。

PHP和Python的协作

图2:PHP和Python的协作

数据库一向是网站架构中最具挑战性的,瓶颈通常出现在这里。又拍网的照片数据量很大,数据库也几度出现严重的压力问题。 因此,这里我主要介绍一下又拍网在分库设计这方面的一些尝试。

分库设计

和很多使用MySQL的2.0站点一样,又拍网的MySQL集群经历了从最初的一个主库一个从库、到一个主库多个从库、 然后到多个主库多个从库的一个发展过程。

数据库的进化过程

图3:数据库的进化过程

最初是由一台主库和一台从库组成,当时从库只用作备份和容灾,当主库出现故障时,从库就手动变成主库,一般情况下,从库不作读写操作(同步除外)。随着压力的增加,我们加上了memcached,当时只用其缓存单行数据。 但是,单行数据的缓存并不能很好地解决压力问题,因为单行数据的查询通常很快。所以我们把一些实时性要求不高的Query放到从库去执行。后面又通过添加多个从库来分流查询压力,不过随着数据量的增加,主库的写压力也越来越大。

在参考了一些相关产品和其它网站的做法后,我们决定进行数据库拆分。也就是将数据存放到不同的数据库服务器中,一般可以按两个纬度来拆分数据:

垂直拆分:是指按功能模块拆分,比如可以将群组相关表和照片相关表存放在不同的数据库中,这种方式多个数据库之间的表结构不同

水平拆分:而水平拆分是将同一个表的数据进行分块保存到不同的数据库中,这些数据库中的表结构完全相同

拆分方式

一般都会先进行垂直拆分,因为这种方式拆分方式实现起来比较简单,根据表名访问不同的数据库就可以了。但是垂直拆分方式并不能彻底解决所有压力问题,另外,也要看应用类型是否合适这种拆分方式。如果合适的话,也能很好的起到分散数据库压力的作用。比如对于豆瓣我觉得比较适合采用垂直拆分, 因为豆瓣的各核心业务/模块(书籍、电影、音乐)相对独立,数据的增加速度也比较平稳。不同的是,又拍网的核心业务对象是用户上传的照片,而照片数据的增加速度随着用户量的增加越来越快。压力基本上都在照片表上,显然垂直拆分并不能从根本上解决我们的问题,所以,我们采用水平拆分的方式。

拆分规则

水平拆分实现起来相对复杂,我们要先确定一个拆分规则,也就是按什么条件将数据进行切分。 一般2.0网站都以用户为中心,数据基本都跟随用户,比如用户的照片、朋友和评论等等。因此一个比较自然的选择是根据用户来切分。每个用户都对应一个数据库,访问某个用户的数据时, 我们要先确定他/她所对应的数据库,然后连接到该数据库进行实际的数据读写。

那么,怎么样对应用户和数据库呢?我们有这些选择:

按算法对应

最简单的算法是按用户ID的奇偶性来对应,将奇数ID的用户对应到数据库A,而偶数ID的用户则对应到数据库B。这个方法的最大问题是,只能分成两个库。另一个算法是按用户ID所在区间对应,比如ID在0-10000之间的用户对应到数据库A, ID在10000-20000这个范围的对应到数据库B,以此类推。按算法分实现起来比较方便,也比较高效,但是不能满足后续的伸缩性要求,如果需要增加数据库节点,必需调整算法或移动很大的数据集, 比较难做到在不停止服务的前提下进行扩充数据库节点。

按索引/映射表对应

这种方法是指建立一个索引表,保存每个用户的ID和数据库ID的对应关系,每次读写用户数据时先从这个表获取对应数据库。新用户注册后,在所有可用的数据库中随机挑选一个为其建立索引。这种方法比较灵活,有很好的伸缩性。一个缺点是增加了一次数据库访问,所以性能上没有按算法对应好。

比较之后,我们采用的是索引表的方式,我们愿意为其灵活性损失一些性能,更何况我们还有memcached, 因为索引数据基本不会改变的缘故,缓存命中率非常高。所以能很大程度上减少了性能损失。

数据访问过程

图4:数据访问过程

索引表的方式能够比较方便地添加数据库节点,在增加节点时,只要将其添加到可用数据库列表里即可。 当然如果需要平衡各个节点的压力的话,还是需要进行数据的迁移,但是这个时候的迁移是少量的,可以逐步进行。要迁移用户A的数据,首先要将其状态置为迁移数据中,这个状态的用户不能进行写操作,并在页面上进行提示。 然后将用户A的数据全部复制到新增加的节点上后,更新映射表,然后将用户A的状态置为正常,最后将原来对应的数据库上的数据删除。这个过程通常会在临晨进行,所以,所以很少会有用户碰到迁移数据中的情况。

当然,有些数据是不属于某个用户的,比如系统消息、配置等等,我们把这些数据保存在一个全局库中。

问题

分库会给你在应用的开发和部署上都带来很多麻烦。

不能执行跨库的关联查询

如果我们需要查询的数据分布于不同的数据库,我们没办法通过JOIN的方式查询获得。比如要获得好友的最新照片,你不能保证所有好友的数据都在同一个数据库里。一个解决办法是通过多次查询,再进行聚合的方式。我们需要尽量避免类似的需求。有些需求可以通过保存多份数据来解决,比如User-A和User-B的数据库分别是DB-1和DB-2, 当User-A评论了User-B的照片时,我们会同时在DB-1和DB-2中保存这条评论信息,我们首先在DB-2中的photo_comments表中插入一条新的记录,然后在DB-1中的user_comments表中插入一条新的记录。这两个表的结构如下图所示。这样我们可以通过查询photo_comments表得到User-B的某张照片的所有评论, 也可以通过查询user_comments表获得User-A的所有评论。另外可以考虑使用全文检索工具来解决某些需求, 我们使用Solr来提供全站标签检索和照片搜索服务。

评论表结构

图5:评论表结构

不能保证数据的一致/完整性

跨库的数据没有外键约束,也没有事务保证。比如上面的评论照片的例子, 很可能出现成功插入photo_comments表,但是插入user_comments表时却出错了。一个办法是在两个库上都开启事务,然后先插入photo_comments,再插入user_comments, 然后提交两个事务。这个办法也不能完全保证这个操作的原子性。

所有查询必须提供数据库线索

比如要查看一张照片,仅凭一个照片ID是不够的,还必须提供上传这张照片的用户的ID(也就是数据库线索),才能找到它实际的存放位置。因此,我们必须重新设计很多URL地址,而有些老的地址我们又必须保证其仍然有效。我们把照片地址改成/photos/{username}/{photo_id}/的形式,然后对于系统升级前上传的照片ID, 我们又增加一张映射表,保存photo_id和user_id的对应关系。当访问老的照片地址时,我们通过查询这张表获得用户信息, 然后再重定向到新的地址。

自增ID

如果要在节点数据库上使用自增字段,那么我们就不能保证全局唯一。这倒不是很严重的问题,但是当节点之间的数据发生关系时,就会使得问题变得比较麻烦。我们可以再来看看上面提到的评论的例子。如果photo_comments表中的comment_id的自增字段,当我们在DB-2.photo_comments表插入新的评论时, 得到一个新的comment_id,假如值为101,而User-A的ID为1,那么我们还需要在DB-1.user_comments表中插入(1, 101 ...)。 User-A是个很活跃的用户,他又评论了User-C的照片,而User-C的数据库是DB-3。 很巧的是这条新评论的ID也是101,这种情况很用可能发生。那么我们又在DB-1.user_comments表中插入一行像这样(1, 101 ...)的数据。 那么我们要怎么设置user_comments表的主键呢(标识一行数据)?可以不设啊,不幸的是有的时候(框架、缓存等原因)必需设置。那么可以以user_id、 comment_id和photo_id为组合主键,但是photo_id也有可能一样(的确很巧)。看来只能再加上photo_owner_id了, 但是这个结果又让我们实在有点无法接受,太复杂的组合键在写入时会带来一定的性能影响,这样的自然键看起来也很不自然。所以,我们放弃了在节点上使用自增字段,想办法让这些ID变成全局唯一。为此增加了一个专门用来生成ID的数据库,这个库中的表结构都很简单,只有一个自增字段id。 当我们要插入新的评论时,我们先在ID库的photo_comments表里插入一条空的记录,以获得一个唯一的评论ID。 当然这些逻辑都已经封装在我们的框架里了,对于开发人员是透明的。 为什么不用其它方案呢,比如一些支持incr操作的Key-Value数据库。我们还是比较放心把数据放在MySQL里。 另外,我们会定期清理ID库的数据,以保证获取新ID的效率。

实现

我们称前面提到的一个数据库节点为Shard,一个Shard由两个台物理服务器组成, 我们称它们为Node-A和Node-B,Node-A和Node-B之间是配置成Master-Master相互复制的。 虽然是Master-Master的部署方式,但是同一时间我们还是只使用其中一个,原因是复制的延迟问题, 当然在Web应用里,我们可以在用户会话里放置一个A或B来保证同一用户一次会话里只访问一个数据库, 这样可以避免一些延迟问题。但是我们的Python任务是没有任何状态的,不能保证和PHP应用读写相同的数据库。那么为什么不配置成Master-Slave呢?我们觉得只用一台太浪费了,所以我们在每台服务器上都创建多个逻辑数据库。 如下图所示,在Node-A和Node-B上我们都建立了shard_001和shard_002两个逻辑数据库, Node-A上的shard_001和Node-B上的shard_001组成一个Shard,而同一时间只有一个逻辑数据库处于Active状态。 这个时候如果需要访问Shard-001的数据时,我们连接的是Node-A上的shard_001, 而访问Shard-002的数据则是连接Node-B上的shard_002。以这种交叉的方式将压力分散到每台物理服务器上。 以Master-Master方式部署的另一个好处是,我们可以不停止服务的情况下进行表结构升级, 升级前先停止复制,升级Inactive的库,然后升级应用,再将已经升级好的数据库切换成Active状态, 原来的Active数据库切换成Inactive状态,然后升级它的表结构,最后恢复复制。 当然这个步骤不一定适合所有升级过程,如果表结构的更改会导致数据复制失败,那么还是需要停止服务再升级的。

Database Layout

图6:数据库布局

前面提到过添加服务器时,为了保证负载的平衡,我们需要迁移一部分数据到新的服务器上。为了避免短期内迁移的必要,我们在实际部署的时候,每台机器上部署了8个逻辑数据库, 添加服务器后,我们只要将这些逻辑数据库迁移到新服务器就可以了。最好是每次添加一倍的服务器, 然后将每台的1/2逻辑数据迁移到一台新服务器上,这样能很好的平衡负载。当然,最后到了每台上只有一个逻辑库时,迁移就无法避免了,不过那应该是比较久远的事情了。

我们把分库逻辑都封装在我们的PHP框架里了,开发人员基本上不需要被这些繁琐的事情困扰。下面是使用我们的框架进行照片数据的读写的一些例子:

<?php
    $Photos = new ShardedDBTable('Photos', 'yp_photos', 'user_id', array(
                'photo_id'    => array('type' => 'long', 'primary' => true, 'global_auto_increment' => true),
                'user_id'     => array('type' => 'long'),
                'title'       => array('type' => 'string'),
                'posted_date' => array('type' => 'date'),
            ));

    $photo = $Photos->new_object(array('user_id' => 1, 'title' => 'Workforme'));
    $photo->insert();

    // 加载ID为10001的照片,注意第一个参数为用户ID
    $photo = $Photos->load(1, 10001);

    // 更改照片属性
    $photo->title = 'Database Sharding';
    $photo->update();

    // 删除照片
    $photo->delete();

    // 获取ID为1的用户在2010-06-01之后上传的照片
    $photos = $Photos->fetch(array('user_id' => 1, 'posted_date__gt' => '2010-06-01'));
?>

首先要定义一个ShardedDBTable对象,所有的API都是通过这个对象开放。第一个参数是对象类型名称, 如果这个名称已经存在,那么将返回之前定义的对象。你也可以通过get_table('Photos')这个函数来获取之前定义的Table对象。 第二个参数是对应的数据库表名,而第三个参数是数据库线索字段,你会发现在后面的所有API中全部需要指定这个字段的值。 第四个参数是字段定义,其中photo_id字段的global_auto_increment属性被置为true,这就是前面所说的全局自增ID, 只要指定了这个属性,框架会处理好ID的事情。

如果我们要访问全局库中的数据,我们需要定义一个DBTable对象。

<?php
    $Users = new DBTable('Users', 'yp_users', array(
                'user_id'  => array('type' => 'long', 'primary' => true, 'auto_increment' => true),
                'username' => array('type' => 'string'),
            ));
?>

DBTable是ShardedDBTable的父类,除了定义时参数有些不同(DBTable不需要指定数据库线索字段),它们提供一样的API。

缓存

我们的框架提供了缓存功能,对开发人员是透明的。

<?php
    $photo = $Photos->load(1, 10001);
?>

比如上面的方法调用,框架先尝试以Photos-1-10001为Key在缓存中查找,未找到的话再执行数据库查询并放入缓存。当更改照片属性或删除照片时,框架负责从缓存中删除该照片。这种单个对象的缓存实现起来比较简单。稍微麻烦的是像下面这样的列表查询结果的缓存。

<?php
    $photos = $Photos->fetch(array('user_id' => 1, 'posted_date__gt' => '2010-06-01'));
?>

我们把这个查询分成两步,第一步先查出符合条件的照片ID,然后再根据照片ID分别查找具体的照片信息。 这么做可以更好的利用缓存。第一个查询的缓存Key为Photos-list-{shard_key}-{md5(查询条件SQL语句)}, Value是照片ID列表(逗号间隔)。其中shard_key为user_id的值1。目前来看,列表缓存也不麻烦。 但是如果用户修改了某张照片的上传时间呢,这个时候缓存中的数据就不一定符合条件了。所以,我们需要一个机制来保证我们不会从缓存中得到过期的列表数据。我们为每张表设置了一个revision,当该表的数据发生变化时(调用insert/update/delete方法), 我们就更新它的revision,所以我们把列表的缓存Key改为Photos-list-{shard_key}-{md5(查询条件SQL语句)}-{revision}, 这样我们就不会再得到过期列表了。

revision信息也是存放在缓存里的,Key为Photos-revision。这样做看起来不错,但是好像列表缓存的利用率不会太高。因为我们是以整个数据类型的revision为缓存Key的后缀,显然这个revision更新的非常频繁,任何一个用户修改或上传了照片都会导致它的更新,哪怕那个用户根本不在我们要查询的Shard里。要隔离用户的动作对其他用户的影响,我们可以通过缩小revision的作用范围来达到这个目的。 所以revision的缓存Key变成Photos-{shard_key}-revision,这样的话当ID为1的用户修改了他的照片信息时, 只会更新Photos-1-revision这个Key所对应的revision。

因为全局库没有shard_key,所以修改了全局库中的表的一行数据,还是会导致整个表的缓存失效。 但是大部分情况下,数据都是有区域范围的,比如我们的帮助论坛的主题帖子, 帖子属于主题。修改了其中一个主题的一个帖子,没必要使所有主题的帖子缓存都失效。 所以我们在DBTable上增加了一个叫isolate_key的属性。

<?php
$GLOBALS['Posts'] = new DBTable('Posts', 'yp_posts', array(
        'topic_id'    => array('type' => 'long', 'primary' => true),
        'post_id'     => array('type' => 'long', 'primary' => true, 'auto_increment' => true),
        'author_id'   => array('type' => 'long'),
        'content'     => array('type' => 'string'),
        'posted_at'   => array('type' => 'datetime'),
        'modified_at' => array('type' => 'datetime'),
        'modified_by' => array('type' => 'long'),
    ), 'topic_id');
?>

注意构造函数的最后一个参数topic_id就是指以字段topic_id作为isolate_key,它的作用和shard_key一样用于隔离revision的作用范围。

ShardedDBTable继承自DBTable,所以也可以指定isolate_key。 ShardedDBTable指定了isolate_key的话,能够更大幅度缩小revision的作用范围。 比如相册和照片的关联表yp_album_photos,当用户往他的其中一个相册里添加了新的照片时, 会导致其它相册的照片列表缓存也失效。如果我指定这张表的isolate_key为album_id的话, 我们就把这种影响限制在了本相册内。

我们的缓存分为两级,第一级只是一个PHP数组,有效范围是Request。而第二级是memcached。这么做的原因是,很多数据在一个Request周期内需要加载多次,这样可以减少memcached的网络请求。另外我们的框架也会尽可能的发送memcached的gets命令来获取数据, 从而减少网络请求。

总结

这个架构使得我们在很长一段时间内都不必再为数据库压力所困扰。我们的设计很多地方参考了netlogflickr的实现,因此非常感谢他们将一些实现细节发布出来。

关于作者:

周兆兆(Zola,不是你熟知的那个),又拍网架构师。6年IT从业经验,不太专注于某项技术,对很多技术都感兴趣。

转自http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/

Database Sharding at Netlog, with MySQL and PHP

This article accompanies the slides from a presentation on database sharding. Sharding is a technique used for horizontal scaling of databases we are using at Netlog. If you're interested in high performance, scalability, MySQL, php, caching, partitioning, Sphinx, federation or Netlog, read on ...

This presentation was given at the second day of FOSDEM 2009 in Brussels. FOSDEM is an annual conference on open source software with about 5000 hackers. I was invited by Kris Buytaert and Lenz Grimmer to give a talk in the MySQL Dev Room. The talk was based on an earlier talk I gave at BarcampGent 2.

Overview

 

Who am I?

Currently I am a Lead Web Developer at Netlog working with php, MySQL and other frontend technologies to develop and improve the features of our social network. I've been doing this for 3 years now. For this paper it is important to mention that I am neither a DBA nor a sys-admin, so I approach the problem of scaling databases from an application / developer point of view.
Of course the solutions presented in this presentation are the result of a lot of effort from the Development and IT Services Department at Netlog.

What is Netlog?

For those of you, who are unfamiliar with Netlog, it's best to sketch a little overview of who and what we are, and especially where we come from in terms of userbase and growth. It will let you see things in perspective regarding scalability. At the moment we have over 40 million active members, resulting in over 50 million unique visitors per month. This adds up to 5+ billion page views per month and 6 billion online minutes per month. We're active in 26 languages and 30+ countries with our 5 most active countries being Italy, Belgium, Turkey, Switzerland and Germany. (If you're interested in more info about the company, check our About Pages and sign-up for an account.)

In terms of database statistics, this type of usage results among others in huge amounts of data to store (eg. 100+ million friendships for nl.netlog.com). The nature of ourapplication (lots of interaction) results in a very write-heavy app (with a read-write ratio of about 1.4 to 1). A typical database, before sharding, had an average of 3000+ queries per second during the peaktime (15h - 22h local time, for nl.netlog.com).
Of course, these requirements do not have to be met by every application, and different applications require different scaling strategies. Nevertheless we wouldn't have thought (or hoped) to be where we are today, when we started off 7 years ago as a college student project. We are convinced that we can give you further insight into scalability and share some valuable suggestions.
Below is a graph of our growth in the last year.

This growth has of course resulted in several performance issues. The bottleneck for us has often been the database layer, because this layer is the only layer in the web stack that isn't stateless. The interactions and dependencies in a relational database system, make scaling horizontally less evident.

Netlog is (being) built and runs on open source software such as phpMySQLApacheDebian,MemcachedSphinxLighttpdSquid, and many more. Our solutions for scaling databases are also built on these technologies. That's why we want to give something back by documenting and sharing our story.

A history of scaling database systems

As every hobby project, Netlog (then asl.to, "your internet passport") started off, more then 7 years ago, with a single database instance on a - probably virtual - server in a shared hosting environment. As traffic grew and load increased, we moved to a separate server, with eventually a split setup for MySQL and php (database setup 1).

Database Setup 1: Master (W)

A next step to be taken was introducing new databases configured as "slaves" of the "master" database. Because a single MySQL server couldn't serve all the requests from our application, we distributed the read and write traffic to separate hosts. Setting up a slave is pretty easy through MySQL's replication features. What happens in a master-slave configuration is that you direct all write-queries (INSERT/UPDATE/DELETE) to the master database and all (or most) read queries to one or more slave databases. Slaves databases are typically kept in sync with the master by reading the binlog files of the master and replaying all write-queries (database setup 2).
Problems to tackle for this set-up include increased complexity for your DBA-team (that needs to monitor multiple servers), and the possibility of "replication lag"; your slaves might get out-of-sync with the master database (because of locking read-queries, downtime, inferior hardware, etc.), resulting in out-of-date results being returned when querying the slave databases.
Not in every situation real-time results are required, however you'll have situations where you have to force some read-queries to your master database to ensure data integrity. Otherwise you will end up with the painful consequences of (possible) race conditions.

 

Database Setup 2: Master (W) + Slaves (R)

A good idea for the master-slave set-up is to introduce roles for your slaves. Typically you might assign all search, backend and/or statistics related queries to a "search-slave", where you don't care that much about replication lag, since real time results are seldom required for those kind of use cases.

This system works especially well for read-heavy applications. Say you've got a server load of 100% and a read/write ratio of 4/1, your master server will be executing SELECT-queries 80% of the time. If you add a single slave to this set-up, the SELECT capacity doubles and you can handle twice the amount of SELECT-queries.
But in a write-heavy application, or a situation where your master database is executing write-queries for 90% of the time, you'll only win another 10% capacity by adding another slave, since your slaves will be busy syncing with their master for about 90% of the time. The problem here is that you're only distributing read traffic and no write traffic. In fact you're replicating the write traffic. Considering the fact that the efficiency of a Master-Slave setup is limited, you end up with lots of identic copies of your data.

At this point you'll have to start thinking about distributing write traffic. The heavier your application relies on write traffic, the sooner you'll have to deal with this. A simple, and straightforward, first step is to start partitioning your application on feature-level. This process is called vertical partitioning.
In your application you identify features (and by that MySQL tables) that more or less can exist on separate servers. If you have tables that are unrelated and don't require JOINs, why not put them on separate servers? For Netlog we have been able to put most of the tables containing details about a the items of a user (eg. photos, blogs, videos, polls, ...) on separate servers. By replicating some important tables (eg. a table with userids, nicknames, etc.) to all separate partitions, you can still access and JOIN with those tables if you might need to.
In database setup 3, you see an example where we don't bother our master database anymore for friends or messages related queries. The write and read queries for these features go directly to the database responsible for that feature. These feature-specific hosts are still configured as slaves of the "TOP" master database, because that way we can replicate a few of those really important tables.
A good idea here is to split up the tables for OLAP use cases (data warehouses) from OLTP use cases (front-end, real time features), since these require a different approach and have different needs regarding speed and uptime, anyways.

Database Setup 3: Vertical Partitioning

What we did in setup 2, can be easily repeated for each of the vertically partitioned features. If any of your databases have trouble keeping up with the traffic requirements, configure a slave for that database and distribute the read and write traffic. This way you create a tree of databases replicating some tables through the whole system and a database class responsible for distributing the right queries to the right databases (database setup 4).

Database Setup 4: Vertical Partitioning / Replication Tree

If necessary, you might dive deeper into your application and find more features to partition. Unfortunately this will become harder and harder, because with every feature you split up, you again lose some JOIN-functionality you might want or need. And, sometimes, you're even stuck with a single table that's growing too large and grows beyond what a single database host can easily manage. The first feature to hit this single-table-on-a-single-database limit, was a table with friendships between our users. This table grew so rapidly that the performance and uptime of the host responsible for this feature wasn't guaranteed anymore, no matter how many slaves we added to it. Of course you can always choose to scale up, instead of scale out, by buying boxes with an incredibly insane hardware setup, but apart from being nice to have, they're expensive and they'll still hit limits if you continue growing.
This approach to scaling has a limit (database setup 5) and if you hit that limit, you have to rethink your scaling strategy.

Database Setup 5: Hitting Limits

So, what's next?

What could we do now? Vertical partitioning has helped us a great deal, but we are stuck. Does master-to-master replication help? Will a cluster set-up help? Not really; these systems are designed for high availability and high performance. They work by replicating data and don't offer solutions for distributing write traffic.
What about caching? Oh, how can we forget about caching! Of course, caching will help a great deal in lowering the load on your database servers. (The read/write-ratio mentioned earlier would be completely different if we did no caching.) But the same problem remains: caching will lower the read traffic on your databases, but doesn't offer a solution for write traffic. Caching will delay the moment your database is only returning "1040 Too many connections" errors, but no matter how good your caching strategy is, it can't prevent your visitor metrics going nuts at some point.

The Holy Grail!

You can't split a table vertically, but can you easily split it horizontally? Sharding a table is putting several groups of records of that table in separate places (be it physically or not). You cut your data into arbitrarily sized pieces / fragments / shards and distribute them over several database hosts. Instead of putting all 100+ million friendships records on 1 big expensive machine, put 10 million friendships on each of 10 smaller and cheaper machines.

Sharding, or horizontal partitioning, is a term that was already in active use in 1996 in the MMO (Massive Multiplayer Online) Games world. If you're searching for info on sharding, you'll see it's a technique used by among others FlickrLiveJournalSun and Netlog.

Sharding a photos table over 10 servers with a modulo partitioning scheme

In the image above you see an example of splitting up a photos-table over 10 different servers. The algorithm that's used to decide where you data goes or where you can access your data is eg. a modulo function on the userid of the owner of that photo. If you know the owner of a photo, you then know where to fetch the photo's other details, fetch its comments, etc.
Let's hvae a look at another simple example.

  • Use case: a simple blog site.
  • We've got a table with blog posts, with these columns: postid, title, message, dateadd, authorid
  • authorid is a FK (foreign key) to a users table
  • We shard the blog posts table (because our authors have been very productive writers) over 2 databases.
  • Posts from authors with an even authorid go to database 1.
  • Posts from authors with an uneven authorid go to database 2.
  • Query: "Give me the blog messages from author with id 26."

In a non sharded environment, somewhere in your application, you'd find code that looks like this:

PHP:
  1. $db = DB::getInstance()// fetch a database instance
  2. $db-> prepare ( "SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}" )// prepare a query
  3. $db->assignInt('userID'$userID)// assign query variables
  4. $db-> execute ( )// execute the query
  5. $results = $db->getResults()// fetch an array of results

 

In this example we first fetch an instance of our database class that connects to our database. We then prepare a query, assign the variables (here the id of the author $userID), execute the query and fetch the resultset. If we introduce sharding based on the author's $userID, the database we need to execute this query on, is depending on that $userID (whether or not it is an even number). An approach to handle this could be to include the logic of "which user is on which database" into our database class and pass on that $userID to that class. You could end up with something like this: you pass on the $userID to the DB::getInstance() function, which then returns an object with the connection details based on the result of $userID % 2:

PHP:
  1. $db = DB::getInstance($userID)// fetch a database instance, specific for this user
  2. $db-> prepare ( "SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}" );
  3. $db->assignInt('userID'$userID);
  4. $db-> execute ( );
  5. $results = $db->getResults();

 

Instead of passing the $userID as a parameter to your DB-class, you could try to parse it from the prepared query you supply your class, or you could do your calculation of which DB connection you need on a different level, but the key concept remains the same: you need to pass some extra information to your database class to know where to execute the query. That is one of the most challenging requirements that has to be met for successful sharding.

How to shard your data?

When you want to split up your data two questions spring to mind: which property of the data (which column of the table) will I use to make the decisions on where the data should go? And what will the algorithm be? Let's call the first one the "sharding/partitioning key", and the second one the "sharding/partitioning scheme".

Which sharding key will be used is basically a decision that depends on the nature of your application, or the way you'll want to access your data. In the blog example, if you display overviews of blog messages per author, it's a good idea to shard on the author's $userID. Say your site's navigation is through archives per month or per category, it might be smarter to shard on publication date or $categoryID. (If your application requires both approaches it might even be a good idea to set up a dual system with sharding on both keys.)

What you can do with the "shard key" to find its corresponding shard basically falls into 4 categories:

  • Vertical Partitioning: Splitting up your data on feature/table level can be seen as a kind of sharding, where the "shard key" is eg. the table name. As mentioned earlier this way of sharding is pretty straightforward to implement and has a relatively low impact on the application on the whole.
  • Range-based Partitioning: In range based partitioning you split up your data according to several ranges. Blog posts from before the 2000 and before go to database 1, blog posts from the new millenium go to the other database. This approach is typical for logging or other time based data. Other examples of range based partitioning could include federating users according to the first number of their postal code.
  • Key or Hash based Partitioning: The modulo-function used in the photos example is a way of partitioning your data based on hashing or other mathematical functions of the key. In the simple example of a modulo function you can use your number of shards for the modulo-operation. Of course, changing your number of shards would mean rebalancing your data. This might be a slow process. A way to solve this is to use a more consistent hashing mechanism, or choose the original number of your shards right and work with "virtual shards".
  • Directory based Partitioning: The last and most flexible scheme is where you have a directory lookup for each of the possible values of your shard key, mapped to a certain shard's id. This makes it possible to move all data from a certain shard key (eg. a certain user) from shard to shard, by altering the directory. A directory could on the other hand introduce more overhead or be a SPOF (Single Point Of Failure).

As shown in the blog example, you need to know your "shard key" before you can actually execute your query on the right database server. It means that the nature of your queries and application determines the way of partitioning your data. The demanded flexibility, the projected growth and the nature of your data will be other factors helping you decide on what scheme to use.
You also want to choose your keys and scheme so the data is optimally balanced over the databses and the load to each of the servers in the pool is equal.

The end result of sharding your data should be that you have distributed write-queries to different independent databases, and that you end up with a system of more, but cheaper machines, that each have a smaller amount of the data and thus can process queries faster on smaller tables.
If you succeed, you're online again. Users appreciate it and your DBA is happy, because each of the machines in the setup now has less load and crashes less so there is no tussing and turning through the nights. (Smaller tables means faster queries, and that includes maintainance or ALTER-queries, which again helps in keeping your DBA and developers happy.)

If there's a Holy Grail, there's a Killer Rabbit

Photo from The Rocketeer. (Creative Commons Licensed)

Of course, sharding isn't the silver bullet of horizontal database scaling that will easily solve all your problems. Introducing sharding in your application comes with a significant cost of development. Here are some of its implications:

  • No cross-shard SQL queries: If you ever want to fetch data that (possibly) resides on different shards, you won't be able to do this with a JOIN on SQL-level. If you shard on $userID a JOIN with data from the same user is possible. However once you fetch results from several users on a shard, this will probably be an incomplete resultset. The key here is to design your application so there's no need for cross-shard queries. Other solutions could be the introduction of parallel querying on application level, but then of course you lose the aspect of distributing your database traffic. Depending on the use case, this could be a problem or not (eg. parallel querying for backend purposes is not as crazy as it may sound).
    Other options could be to denormalize your data and make some of the needed info available in several tables on several shards. You could duplicate the nickname of the author of a comment in the comments table to avoid having to do an extra query for that nickname. (The shard where you fetch the comment from, might be different than the shard where you'll find the nickname of the author.)
    If you have a table with guestbook messages that you want to shard, but require fetching both a list of messages by guestbook owner userid as on message poster userid, you could denormalize it by putting your messages (or references) on both the owner's and the poster's shard.
  • Data consistency and referential integrity: Since data from a same "table" resides on several stand-alone database servers it becomes impossible to imply foreign keys, globally unique auto_increment values or execute cross-shard transactions. This means you have to deal with enforcing integrity on the application level, and you might eventually end up spending a significant amount of your development time on check and fix routines.
    One way to reduce the integrity problems is to fake transactions across databases by starting a database transaction on two servers and only committing each of them once you know both servers are up. There will still be a delay in between the two commits of the transaction (which can then again cause problems), but it is one step closer to keeping your data healthy.
  • Balancing shards: If you shard on $userID, your sharding system might be(come) unbalanced because of power users versus inactive users. Not all hardware in your setup might have the same specs. And what if you add more shards, how will you be able to rebalance your setup? 
    Keeping the load on every database equal might take some effort. The choice of partitioning scheme is very important at this point. A directory based approach is the most flexible, but introduces overhead and a possible SPOF.
  • Is your network ready? Your application servers will now possibly fetch and store data on several different servers in one request, your network topology and configuration settings have to be ready. Will you keep connections open for the full page render? Or will you close the connection to your database after every query?
  • Your backup strategy will be different: Your actual data is fragmented over different servers affecting your backup strategy.

Existing solutions?

At the moment Netlog is the 67th most visited website in the world, according to Alexa's ranking. This means that there's at least 66 other websites out there probably facing similar problems as we do. 16 of the 20 most popular websites are powered by MySQL so, we are definitely not alone, are we?
Let's have a look at some of the existing technologies that implement or are somehow related to sharding and scaling database sysems, and let's see which ones could be interesting for Netlog.

MySQL Cluster is one of the technologies you could think would solve similar problems. The truth is that a database cluster is helpful when it comes to high availability and performance, but it's not designed for the distribution of writes.

MySQL Partitioning is another relatively new feature in MySQL that allows for horizontal splitting of large tables into smaller and more performant pieces. The physical storage of these partitions are limited to a single database server though, making it not relevant for when a single table grows out of the capacities of a single database server.

HSCALE and Spock Proxy, that both build on MySQL Proxy, are two other projects that help in sharding your data. MySQL Proxy introduces LUA, as an extra programming language to instruct the proxy (for eg. finding the right shard for this query). At the time we needed a solution for sharding neither of these projects seemed to support directory based sharding the way we'd wanted it to.

HiveDB is a sharding framework for MySQL in Java, that requires the Java Virtual Machine, with a php interface currently being in an infancy state. Being a Java solution makes it less interesting for us, since we prefer the technologies we are experts in and our application is written in: php.

Other technologies that aren't MySQL or php related include HyperTable (HQL), HBaseBigTable,Hibernate Shards (*shivers*), SQLAlchemy (for Python), Oracle RAC, etc ... The memcached SQL-functions or storage engine for MySQL is also a related project that we could mention here.

None of these projects really seemed to come in line with our requirements. But what exactly are they?

  • Flexible for the hardware department.
    We project growth and want the sharding system to be flexible. Knowing that our traffic will increase, we need to be able to add more shards quickly. With a growing amount of data, a proportional growth in hardware is requested. For this reason we opt for a directory based partitioning scheme.
  • No massive rewrite.
    We can't introduce a whole new database layer or incompatible abstraction layer. We want to keep on using our database class as we are doing now and only implement sharding for those features that really require that amount of scaling. That's why we've opted for a solution that builds on what we have and allows for incremental implementation. We also wanted to use the sharding API, without having the data to be physically sharded, so the development and IT departments can independently decide when to do their part of the job.
  • Support for multiple sharding keys.
    Most of our data will probably be sharded on $userID, but we want the system to be flexible so we can implement other keys and/or sharding schemes too.
  • Easy to understand.
    We can't expect each and every of our developers to know everything about scalability and performance. Even if this was the case, the API to access and store data in a sharded environment should make it transparent to them so they shouldn't care about performance and can focus on what's really fun to do: developing and improving on features.
    So, it's best if the API is a php API which makes it easy for them to use in the rest of our application.

Sharding Implementation at Netlog

So, what did we come up with? An in-house solution, written 100% in php. The implementation is mostly middleware between application logic and the database class. We've got a complete caching layer built in (using memcached). Since our site is mainly build around profiles, most of the data is sharded on $userID.

In this system we are using the shard scheme below, where a shard is identified by a unique number ($shardID) that also serves as a prefix for the tables in the sharding system. Several shards (groups of tables) sit together in a "shard database", and several of those databases (not instances) are on a certain "shard database host".
So a host has more then one shard. This allows us to move shards as a whole, or databases as a whole to help in balancing all the servers in the pool and it allows us to play with the amount of shards in a database and amount of shards on a server to find the right balance between table size and open files for that server.
When started using this system in production we had 4000 shards on 40 hosts. Today we've got 80 hosts in the pool.

Shards live in databases, databases live on hosts

From the php side there are two parts of the implementation. The first being a series of management and maintainance related functions allowing to add, edit, delete shards, databases and hosts to the system and a lookup system. The second series of classes provides an API consisting of a database access layer and a caching layer.

The Sharding Management Directory

The directory or lookup system is in fact a single MySQL table translating shard keys to $shardIDs. Typically these are $userID-$shardID combinations. This is a single table with the amount of records being the number of users on Netlog. With only id's saved in that table it's still manageable and can be kept very performant through master-to-master-replication, memcached and/or a cluster set-up.
Next to that there's a series of configuration files that translate $shardIDs to actual database connection details. These configuration files allow us to flag certain shards as not available for read and/or write queries. (Which is interesting for maintainance purposes or when a host goes down.)

The Sharded Tables API

Note: The API we implemented allows for handling more than the typical case I'll discuss next, and also allows for several caching modes and strategies based on the nature and use of its application.

Most records and data in the shard system have both a $userID field and an $itemID field. This $itemID is a $photoID for tables related to photos or $videoID for tables related to videos. (You get the picture ...) The $itemID is sometimes an auto_increment value, or a foreign key and part of a combined primary key with $userID. Each $itemID is thus unique per $userID, and not globally unique, because that would be hard to enforce in a distributed system.

(If you use an auto_increment value in a combined key in MySQL, this value is always a MAX()+1 value, and not an internally stored value. So if you add a new item, delete it again, and insert another record, the auto_increment value of that last insert will be the same as the previously inserted and deleted record. Something to keep in mind ...)

If we want to access data stored in the sharding system we typically create an object representing a table+$userID combination. The API provides all the basic CRUD (Create/Read/Update/Delete) functionalities typically needed in our web app. If we go back to the first example of fetching blog messages by a certain author we come to the following scenario;

Query: Give me the blog messages from author with id 26.

  1. Where is user 26?
    User 26 is on shard 5.
  2. On shard 5; Give me all the $blogIDs ($itemIDs) of user 26.
    That user's $blogIDs are: array(10,12,30);
  3. On shard 5; Give me all details about the items array(10,12,30) of user 26.
    Those items are: array(array('title' => "foo", 'message' => "bar"), array('title' => "milk", 'message' => "cow"));

In this process step 1 is executed on a different server (directory db) then step 2 and 3 (shard 5). Step 2 and 3 could easily be combined into one query, but there's a reason why we don't do it, which I'll explain when discussing our caching strategy.
It's important to note that the functionality behind step 2 allows for adding WHERE, ORDER and LIMIT clauses so you can fetch only the records you need in the order you need.

(One could argue that for the example given here and the way we are using MySQL here, it's not needed to have a relational database and you could try to use simpler database systems. While that could be the case, there's still advantages in using MySQL, for cases you're bypassing this API. It's not that bad to have all your data in the same format either, sharded or not. The possible overhead of still using MySQL hasn't been the bottleneck for us today, but it is certainly something we might consider improving on.)

Shard Management

To keep the servers in the sharding system balanced we are monitoring several parameters such as number of users, filesize of tables and databases, amount of read and write queries, cpu load, etc. Based on those stats we can make decisions to move shards to new or different servers, or even to move users from one shard to another.
Move operations of single users can be done completely transparently and online without that user experiencing downtime. We do this by monitoring write queries. If we start a move operation for a user, we start copying his data to the destination shard. When a write query is executed for that user, we abort the move process, clean up and try again later. So a move of a user will be successful if the user himself/herself isn't active at that time, or if no other user is interacting with him/her (for features in the sharding system).
Moving a complete shard or database at a time is a more drastic approach to balancing the load of servers and requires some downtime which we can keep to a minimum by configuring shards as read only / using a master-slave setup during the switch, etc.

Inherent to this sytem is that if one database goes down, only the users (or interactions with the users) on that database are affected. We (can) improve the availability of shards by introducing clusters, master-master setups or master-slave setups for each shard, but the chance of a shard database server being in trouble are slim to none because of the minor load on shard db's compared to the pre-sharding-era.

Tackling the problems

The difficulties of sharding are partially tackled by implementations with these 3 technologies: Memcached, parallel processing and Sphinx.

Memcached

"memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load." By putting a memory caching layer in between our application logic and the SQL-queries to our shard database we are able to get results much, much faster. This caching layer also allows us to do some of the cross-shard data fetching, previously thought impossible on SQL-level.

For those unfamiliar with memcached, below is a very simple and stripped-down example of Memcached usage where we try to fetch a key-value pair from the cache and if it's not found we compute the value and store it into the system so a subsequent call to the function will instantly return the cached value without stressing the database.

PHP:
  1. function isObamaPresident()
  2. {
  3. $memcache = new Memcache();
  4. $result =  $memcache-> get ( 'isobamapresident' )// fetch
  5. if ($result === false)
  6. {
  7. // do some database heavy stuff
  8. $db = DB:: getInstance ( );
  9. $votes = $db->prepare("SELECT COUNT(*) FROM VOTES WHERE vote = 'OBAMA'")->execute();
  10. $result =  ( $votes(USA_CITIZEN_COUNT /  2 ) ) ?  'Sure is!' :  'Nope.'// well, ideally
  11. $memcache->set('isobamapresident'$result0);
  12. }
  13. return $result;
  14. }

 

Memcached is being used in several ways and on several levels in our application code, and for sharding the main ones include;

  • Each $userID to $shardID call is cached. This cache has a hit ratio of about 100% because every time this mapping changes we can update the cache with the new value and store it in the cache without a TTL (Time To Live).
  • Each record in sharded tables can be cached as an array. The key of the cache is typically tablename + $userID + $itemID. Everytime we update or insert an "item" we can also store the given values into the caching layer, making for a theoretical hit-ratio of again 100%.
  • The results of "list" and "count" queries in the sharding system are cached as arrays of $itemIDs or numbers with the key of the cache being the tablename + $userID (+ WHERE/ORDER/LIMIT-clauses) and a revision number.

The revision numbers for the "list" and "count" caches are itself cached numbers that are unique for each tablename + $userID combination. These numbers are then used in the keys of "list" and "count" caches, and are bumped whenever a write query for that tablename + $userID combination is executed. The revisionnumber is in fact a timestamp that is set to "time()" when updated or when it wasn't found in cache. This way we can ensure all data fetched from cache will always be the correct results since the latest update.
If, with this in mind, we again return to the blog example, we get the following scenario.

Query: Give me the blog messages from author with id 26.

  1. Where is user 26?
    The result of this query is almost always available in memcached.
  2. On shard 5; Give me all the $blogIDs ($itemIDs) of user 26.
    The result of this query is found in cache if it has been requested before since the last time an update to the BLOGS-table for user 26 was done.
  3. On shard 5; Give me all details about the items array(10,12,30) of user 26.
    The results for this query are almost always found in cache because of the big hit-ratio for this type of cache. When fetching multiple items we make sure to do a multi-get request to optimize traffic from and to Memcached.

Because of this caching strategy the two separate queries (list query + details query) which seemed a stupid idea at first, result in better performance. If we hadn't split this up into two queries and cached the list of items with all their details (message + title + ...) in Memcached, we'd store much more copies of the record's properties.

There is an interesting performance tweak we added to the "list" caches is that. Let's say we request a first page of comments (1-20), we actually query for the first 100 items, store that list of 100 in cache and then only return the requested slice of that result. A likely, following call to the second page (21-40) will then always be fetched from cache. So the window we ask from the database is different then the window requested by the app.

For features where caching race conditions might be a problem for data consistency, or for use cases where caching each record separately would be overhead (eg. because the records are only inserted and selected and used for 1 type of query), or for use cases where we do JOIN and more advance SQL-queries, we use different caching modes and/or different API-calls.

This whole API requires quite some php processing we are now doing on application level, where previously this was all handled and optimized by the MySQL server itself. Memory usage and processing time on php-level scale alot better then databases though, so this is less of an issue.

Parallel processing

It is not strange to fetch data stored on different shards in one go, because most data is probably available from memory. If we fetch a friends of friends list, one way to do this could be to fetch your own friends loop over them and fetch their friends and then process those results to get a list of people your friends know, but you don't know yet. 
The amount of actual database queries needed for this will be small, and even so, the queries are simple and superfast. Problems start to occur if we are processing this for users which have a couple of hundreds of friends each. For this we've implemented a system for splitting up certain big tasks into several smaller ones we can process in parallel.
This parallel processing in php is done by doing several web requests to our php server farm that each process a small part of the task. It is actually faster to process 10 smaller tasks simultaneously than to do the whole thing at once. The overhead of the extra web requests and cpu cycles it takes to split up the task and combine the results, are irrelevant compared to the gain.

Using Sphinx

Other typical queries that become impossible for sharded data are overview queries. Say you'd like a page of all the latest photos uploaded by all users. If you'd have your user's photos distributed over a hundred of databases, you'd have to query each, and then process all of those results. Doing that for several features would not be justifiabled, so most of our "Explore" pages (where you browse through and discover content from the community) are served from a different system.
Sphinx is a free and open source SQL full-text search engine. We use it for more than your average input field + search button search engine. In fact a list of most viewed videos of the day, can also be a query result from Sphinx. For most of the data on these overview pages it's not a problem if the data isn't real time. So it's possible to retrieve those results from indexes that are regularly built from the data on each shard and then combined.

For a full overview of how we use Sphinx (and how we got there), I encourage you to have a look at the presentation of my colleague Jayme Rotsaert, "Scaling and optimizing search on Netlog", who's put a lot of effort into using Sphinx.

Final thoughts

If there are only two things I could say about sharding it'd be these two quotes;

  • "Don't do it, if you don't need to!" (37signals.com)
  • "Shard early and often!" (startuplessonslearned.blogspot.com)

Sounds like saying two opposite things? Well, yes and no.

You don't want to introduce sharding in your architecture, because it definitely complicates your set-up and the maintenance of your server farm. There are more things to monitor and more things that can go wrong. 
Today, there is no out-of-the-box solution that works for every set-up, technology and/or use case. Existing tool support is poor, and we had to build quite some custom code to make it possible.
Because you split up your data, you lose some of the features you've grown to like from relational databases.
If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding?

On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.
The hardest part about implementing sharding, has been to (re)structure and (re)design the application so that for every access to your data layer, you know the relevant "shard key". If you query details about a blog message, and blog messages are sharded on the author's userid, you have to know that userid before you can access/edit the blog's title.
Designing your application with this in mind ("What are the possible keys and schemes I could use to shard?"), will definitely help you to implement sharding more easily and incrementally at the moment you might need to.

In our current set-up not everything is sharded. That's not a problem though. We focus on those features that require this scaling strategy, and we don't spend time on premature optimization.
Today, we're spending less ca$h on expensive machines, we've got a system that is available, it can handle the traffic and it scales.

Presentation

View more  presentations from  Jurriaan Persyn.
(tags: fosdem2009 fosdem)

Resources

For further questions or remarks, feel free to contact me at jurriaan@netlog.com and subscribe to my blog atwww.jurriaanpersyn.com and the Netlog developer blog at www.netlog.com/go/developer/blog.


转自:http://www.cnblogs.com/blockcipher/archive/2013/05/07/3064692

相关问答

更多
  • mysql怎样分库?[2023-07-16]

    你需要怎么分?因为数据表太大影响性能了? 建议直接分表数据,性质和mysql分区差不多,例如:把某个范围内的id的记录拷贝至另外的库, 这样就实现大表变小表,当然,程序上也需要改动
  • 你好、这两者两者概念上是一样的;但是网站建设比起网站设计有局限性,更倾向于程序或模板的建站,而网站设计这么理解,它的定义更广一些。 网站建设是指使用标识语言(markup language),通过一系列设计、建模、和执行的过程将电子格式的信息通过互联网传输,最终以图形用户界面(GUI)的形式被用户所浏览。简单来说,网页设计的目的就是产生网站。简单的信息如文字,图片(GIF,JPEG,PNG)和表格,都可以通过使超文件标示语言、可扩展超文本标记语言等标示语言放置到网站页面上。而更复杂的信息如矢量图形、动画、视 ...
  • 分层管理,JSP有个MVC三层,ASP.NET也有个三层架构,你可以把数据库连接的类写在一个文件夹里面,取个名字叫DAL,数据库操作类也放在这个文件夹里,然后再在和DAL这个文件夹平行的文件夹里面建立BLL文件夹,用来写判断和接受处理结果的页面,再在和DAL这个文件夹平行的文件夹里面建立UI文件夹,用来存放前台页面
  • 网站架构,一般认为是根据客户需求分析的结果,准确定位网站目标群体,设定网站整体架构,规划、设计网站栏目及其内容,制定网站开发流程及顺序,以最大限度地进行高效资源分配与管理的设计。其内容有程序架构,呈现架构,和信息架构三种表现。而步骤主要分为硬架构和软架构两步程序。网络架构是现代网络学习和发展的一个必须的基础技术。 机房的选择 在选择机房的时候,根据网站用户的地域分布,可以选择网通或电信机房,但更多时候,可能双线机房才是合适的。越大的城市,机房价格越贵,从成本的角 度看可以在一些中小城市托管服务器,比如说北京 ...
  • 栏目设置 1,个人简介:包括200以上的文字介绍,5张或以上的作品图片展示.。 2,原创文学:记录自己生活中的一切感悟和思考。 3,学习笔记:随时更新和记录自己在学习过程的点点滴滴。 4,资源下载:学习软件下载:例如:Photoshop 、Dream weaver以及Flash等等。 5,相册图库:拥有大量的本人的图片以及自己喜爱的图图。 6,推荐酷站:推荐自己喜欢的站点,以供他人欣赏。 7.个人博客:链接到自己的博客,了解到站长最新的消息。 8.给我留言:人网站的交互式栏目,方便浏览者与站长进行的直接沟通 ...
  • php架构设计[2022-07-15]

    架构 分 软件架构 和硬件架构 , 前者 找个框架 就可以 后者需要你知识面很广
  • 网站设计方案[2023-05-19]

    www.68design.net网站设计综合,里面有关于网站优化的文章,也有很多很好的网站设计案例 www.zcool.com.cn网站设计资源站
  • 我已经就是做网站设计的.其实这个行业相对范围比较狭窄.因为现在的市场以及行业还没有进入成熟规范期.而且现在需要的是网站开发的复合型人才.不仅需要你会设计.而且还要会编程.最好还要会FLASH.各个方面都要拿得出手.所以如果你在设计感觉上没有傲人的天赋.要发展就必须让自己成为一个全才. 还有就是设计类的工作其实是吃青春饭的.到30多岁要不转行要不就向管理方面发展.. 所以如果没有很强的竞争力和职业技能..说实话发展并不是很大..但如果你在工作的过程中不断充实自己.那我相信你一定会有美好的未来
  • 网站设计的毕业论文 键盘论文网很多的哦,之前我就找的他们,效率非常高 很快就给我了,建议你看看
  • Prism最初是为WPF设计的,但现在有一个Silverlight版本。 There is Prism it was originally designed for WPF but there is now a Silverlight release.